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PREFACE 

This report describes part of a comprehensive and continuing program of 
research in multispectral remote sensing of the environment from aircraft and 
satellites. The research is being carried out for NASA’s Lyndon B. Johnson 
Space Center;, Houston, Texas, by the Environmental Research Institute of Michigan 
(formerly the Willow Run Laboratories, a unit of The University of Michigan’s 
Institute of Science and Technology). The basic objective of this program is to 
develop remote sensing as a practical tool for obtaining extensive environmental 
information quickly and economically. 

In recent times, many new applications of multispectral sensing have come 
into being. These include agricultural census- taking, detection of diseased plants, 
urban land studies, measurement of water depth, studies of air and water pollution- 
and general assessment of land-use patterns. Yet the techniques employed remain 
limited by the resolution capability of a multispectral scanner. Techniques 
described in this report may help to overcome this limitation. They may produce 
more accurate estimates of target classes in a scene when a significant number of 
pixels are on boundaries. 

To date, our work on estimation of proportions has included: (1) extension 

of the signature concept to a mixture of ground materials; (2) development of a 
statistical and geometric model for sets and mixtures of signatures; (3) evaluation 
of computational methods used to estimate proportions of a mixture by maximum 
likelihood; (4) creation of a computational technique for assessing the expected 
accuracy of estimation as a function of the signature set; (5) development of 
techniques to identify alien objects; (6) testing and evaluating the proportion 
estimation algorithms on artificial as well as actual multispectral scanner data; 

(7) extension of the basic proportion estimation techniques to exploit prior and 
spatial information; and (8) preliminary evaluation of these extensions on 
space-gathered multispectral scanner data. 

The research covered in this report was performed under Contract NAS9-14123, 
Task IV, and covers the period from 15 May 1974 through 14 March 1975. Dr. Andrew 
Potter has been Technical Monitor for NASA, and Dr. A.H. Feivison has been 
Task Monitor. The program was directed by R.R. Legault, Vice-President of the 
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Environmental Research Institute of Michigan (SRIM); J.D. Erickson, Project 
Director and Head of the ERIM Information Systems and Analysis Department; 
and R.E. Nalepka, Principal Investigator and Head of the ERIM Multispectral 
Analysis Section. The ERIM number for this report is 109600-13-F. 

The authors acknowledge the direction provided by Mr. R.R. Legault, 

Dr. J.D. Erickson, and Mr. R.P. Nalepka, the technical counsel furnished by 
Mr. R.J. Kauth, Dr. R.B. Crane, Dr. W. Richardson, and Dr, W.A, Malila; and 
the secretarial services of Mrs. L.A. Parker, Miss G. Sotomayor, and 
Miss D. Dickerson. 
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SUMMARY 

The potential applications of remote sensing are numerous. However, 
some of these applications are hampered by the limited spatial resolution 
of the sensing device. To surmount this difficulty, procedures have 
been developed to permit more accurate estimates of proportions of target 
classes in a scene when there are a significant number of boundary pixels. 

This report covers a fourth phase in the development of proportion 
estimation techniques. In the first three phases, a basic solution to 
the problem was developed and tested, first on artificial data; and later, 
when it became available, on actual space data. Along with the estimation 
technique, two ancillary developments were pursued: 1) a statistical 

test to detect pixels containing alien (unknown) materials, and (2) a 
geometrical test on the signature set to determine the suitability of 
the associated data set for proportion estimation processing. 

Experience with processing actual space data led to two extensions 
of the basic proportion estimation technique. These extensions constitute 
the fourth phase reported herein. One of them (LIMMIX) incorporates 
prior information in that it is based on the assumption that the number 
of object classes that can occur simultaneously in a pixel is very limited. 
The other (nine-point mixtures) is also based on this concept; but, in 
addition, utilizes spatial information. For a partictilar pixel, this 
spatial information is extracted from the signals of the adjoining pixels. 

Along with these two extensions, suitable alien object detection 
procedures were devised. Also, a geometrical test of the signature set 
was constructed for determining the suitability of the associated data for 
LIMMIX or nine-point mixtures processing. In addition, it was found 
necessary to develop a clustering procedure for obtaining signatures when 
the training fields were narrow. These two procedures have an important 
advantage over the olaer procedure (MIXMAP) . T'Jhereas, for MIXMAP the size 
of the signature set can be no larger than the number of spectral channels 
plus one; for LIMMIX and 9-point mixtures the size of the signature set, 
in principle, may be unlimited even when the number of spectral channels 
is as low as two. 
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Preliminary teste of LIMMIX and nine-point mixtures were made on 
space data and the results superior to those obtained by conventional 
recognition processing or the previous proportion estimation procedure. 

Further investigation is required for solving the problem of setting the 
parameters of the procedures. Also, it appears that additional experimentation 
with multiple signatures for single object classes would be fruitful. 


— L. 
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INTRODUCTION 

In recent years the staff at ERIM has participated in the development 
of various techniques for multispectral remote sensing applications, including 
agricultural land use measurement, geologic classification and water depth 
measurement , 

In conventional multispectral recognition, the total area of each 
ground material is measured by identifying the material in each ground 

area (pixel) covered by one resolution element of a multispectral scanner. i 

The total area covered by a ground material is found by adding up the 

pixels identified with that material. If almost every pixel in the ground ! 

scene contains just one of the possible materials, this technique provides ’ 

adequate estimates of acreages. However, if the pixel contains substantial 
amounts of more than one material, the pixel cannot be properly classified. 

i 

For LANDSAT satellite data over agricultural scenes, in which each pixel ' I 

covers about 1,1 acres, the number of pixels containing significant portions 
of more than one material may approach 30% of the total. 

The purpose of the present effort is to obtain improved area estimates 
of ground materials in these cases. We attempt to overcome the problem 
of boundary pixels in two ways. First, we determine which pixels are 
likely to be on a boundary. Then, for these, we estimate the proportion 
of materials within. 

Since its inception, this effort has consisted of a mix of theoretical 
model studies and tests with both simulated data and modest amounts of 
ground-truthed real data. Now that real data sets with adequate associated 
ground truth are becoming available, we are using these exclusively in 
testing and developing mixtures procedures. The past history of the effort 
is summarized below to provide a context for this report. 
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Our work on. estimation of proportions was accomplished in several 

fi 21 

phases. In the first phase ’ , a mathematical model was constructed 

which related the multispectral signatures of a mixture to the signatures 
of component materials, 'fhls model permitted the maximum likelihood 
estimate of the proportion vector to be formulated in terms of the observed 
data point. The computational aspects of the problem required this 
simplification: that all of the covariance matrices of the signatures of 

the component materials be taken as equal to their average. Theoretical 
and empirical results supported the validity of this assumption. With 
this simplification, proportion estimation becomes a quadratic programming 
problem. Several existing computational methods of quadratic programming 
were adapted and tested on simulated scanner data. Results indicated 
that this method for proportion estimation was feasible, 

[31 

The second phase of the program •* included investigating the problem 
of detecting alien objects — i.e., objects in the scene not represented 
in the signature set, A procedure was devised for rejecting those pixels 
which probably contained significant amounts of alien materials. In 
addition, aircraft scanner data were smoothed over LAWDSAT sized resolution 
elements to simulate spaceborne scanner data. I^hen proportion estimation 
techniques were tested on this data, estimates of crop acreage based on 
the estimated proportions were found to be better than estimates obtained 
with conventional recognition techniques. 

The third phase of the program was devoted largely to reducing 
computation time required for the procedures. This was accomplished by 
improving the basic algorithm. It takes about 20 msec on an IBM 7094 computer 


[1] Horwits, H. M. , R.?. Nalepka, P.D. Hyde, and J.P. Morgenstern, 1971, 
Estimating the Proportions of Objects within a Single Resolution 

on Remote Sensing of Environment, Report No. 10259-1-X, May 1971 
Willow Run Laboratories of the Institute of Science and Technology, 
The University of Michigan, Ann Arbor, 

[2] Nalepka R.P., H, M. Horwits, and P.D. Hyde, 1972, Estimating 
Proportions of Objects From Multispectral Data, Report Ho. 31650-73-T 
Willow Run Laboratories of the Institute of Science and Technology, 
The University of Michigan, Ann Arbor, 

[3] Nalepka, R,F., and P.D. Hyde, 1973, Estimating Crop Acreage From 
Space-Simulated Multispectral Scanner Data, Report No, 31650-148-T, 
Environmental Research Institute of Michigan, Ann Arbor. 
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to process LANDSAT signal assuming there are five signatures. In order to 
reduce processing time still further, averaging procedures were considered. 
Averaging improved the speed of estimation by a factor approximately equal 
to the number of points included in the average; but accuracy of estimation, 
contrary to theoretical expectations, was unsatisfactory. During this 
phase, satellite data with associated ground truth information became 
available. Testing of the procedures on this data, as well as results of 

r S 6 71 

other investigators^ ’ ’ ^ suggested extensions of the basic proportion 
estimation procedure. 

Investigation of two extensions constitutes the fourth phase of our 
program and covers the period of this report. One extension is based on 
the assumption that the number of object classes that can occur simultaneously 
in a single pixel is very limited. Although our experimental computer 
program (called permits taking this limit as large as 4, experience 

shows that two is an effective value. The other extension (called "nine- 
point mixtures") incorporates this l:Lmiting concept; but, in addition, 
utilises spatial information. For a particular pixel, this spatial 
information is retracted from the signals of adjoining pixels. 

These two procedures, LIMMIX and nine-point mixtures, have an important 
advantage over the original proportion estimation procedure, MIXMAP. A 
necessary requirement for MIXMAP processing is that the size of the 
signature set be no larger than the number of spectral channels plus one. 
However for LIMMIX and nin-point mixtures, the size of the signature set 
may be , in principle, unlimited even when the number of spectral channels 
is as low as two, 


[5] Malila, W.A, , and R,F. Nalepka, 1973, Atmospheric Effects in ERTS-1 
Data and Advanced Information Extraction Techniques, Symposium On 
Significant Results Obtained From the Earth Resources Technology 
Satellite-1, Vol, 1, Goddard Space Flight Center, Greenbelt, MD, 

[6] Thomson, F, J., 1973, Crop Species Recognition and Mensuration 

in the Sacramento Valley, Symposium on Significant Results Obtained 
From the Earth Resources Technology Satellite-1, Vol, 1, Goddard 
Space Flight Center, Greenbelt, Md, 

[7] Richardson, W, , 1974, A Study of Some Nine-Element Decision Rules, 
Report No. 190100-32-T, Environmental Research Institute of Michigan, 
Ann Arbor. 
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Preliminary tests of these new procedures were- made on ERTS data 
sets. One scene contained a number of lakes and ponds and the objective 
of the tests was to measure the surface water acreage. The other scenes 
were agricultural with selected target crops. Results were encouraging. 

The next section reviews our basic approach to proportion estimation. 
The LIMMIX procedure is explairad in Section 4 and results of tests are 
presented. Section 5 contains a description of the nine-point mixtures 
algorithm. It also contains comparison tests of this procedure with 
selected other procedures. More or less burdensome details of all 
sections have been relegated to appendices. 
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APPROACH TO PROPORTION ESTIMATION 


A basic application of remote sensing is the determination of the 

proportion of a scene covered by a target class (object class of interest). 

For example, what proportion of a 5 x 20 mi. segment of Fayette County, 

Illinois was covered by wheat on 12 June 1973? The usual approach to 

obtaining an estimate of the proportions of target classes in a scene is 

based on the assumption that each pixel contains a single object class. 

For multispectral data gathered at space altitudes, we know that pixel 

size is relatively large compared to field size for a typical agricultural 

scene, and that often 30% of the pixels may be boundary pixels (pixels which 

[3] 

contain more than one object class). Reference'- contains a discussion 
of the mechanism by which errors are introduced into the estimate of the 
proportions of target classes by processing procedures which do not 
account for boundary pixels, 

To the best of our knowledge, ERIM was the first to take into account 

[1 2 ] 

boundary pixels by associating signatures with mixtures of object classes ’ 

Later Detchmendy and Pace [8] published an approach which was quite similar 
(see reference [9] for a comparison of the methods. More recently, H. 0, Hartley 


[1] Horwitz, H. M, , R.F. Nalepka, P.D. Hyde, and J.P. Morgenstern, 1971. 
Estimating the Proportions of Objects Within a Single Resolution 
Element of a Multispectral Scanner, Seventh International Symposium 
on Remote Sensing of Environment, Report No, 10259-1-X, May 1971, 
Willow Run Laboratories of the Institute of Science and Technology, 
The University of Michigan, Ann Arbor. 

[2] Nalepka, R. F,, H, M, Horwitz, and Pj.D, Hyde, 1972, Estimating 
Proportions of Objects From Multispectral Data, Report No. 31650-73-T 
Willow Run Laboratories of the Institute of Science and Technology, 
The University of Michigan, Ann Arbor, 

[3] Nalepka, R. F,, andP.D, Hyde, 1973, Estimating Crop Acreage From 
Space-Simulated Multispectral Scanner Data, Report No. 31650-148-T, 
Environmental Research Institute of Michigan, Ann Arbor, 

[8] Detchmendy, D.M. , and W, H. Pace, 1972, A Model for Spectral 
Signature Variability for Mixtures, Earth Resources Observation and 
Information Analysis Systems Conference, Tullaboma, Tennessee. 

[9] Salvato, Jr., P. 1973, Iterative Techniques to Estimate Signature 
Vectors for Mixture Processing of Multispectral Data, Conference 
on Machine Processing of Remotely Sensed Data, Purdue University, 
Lafayette, Indiana, 
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has suggested a modified moment method approach to account for boundary 
pixels. Many other current methods for proportion estimation (see, for » 
example, [10]) take as a model what is termed "mixtures of distributions" 
in the statistical literature. This model does not account for boundary 
pixels. 

This section sketches the basis of ERIM^s approach to proportion 
estimation. Included is a discussion of the correlation assumption 
implicit in the model for signatures of mixtures of object classes within 
a single pixel. Evidence supporting the validity of this assumption for 
LANDSAT-size pixels is presented. The procedure for estimating the 
proportions of object classes within a pixel is then explained and the 
rationale for making the simplifying assumption of equal covariance 
matrices of the signatures is presented. Finally, .possible fruitful 
extensions of the basic proportion estimation procedures are discussed. 

3.1 MODEL FOR SIGNATURES OF MIXTURES 

When the IFOV (Instantaneous Field of View) of a multispectral scanner 
is large with respect to the structure of the scene being scanned, a single 
resolution cell (pixel) may contain more than a single object or material. 

A mathematical model has been constructed which relates the signature of 
a mixture of materials to the signatures of the component materials. 

Suppose the scanner has n spectral channels and that the signature of 
object class 1, where 1 ^ i ^ m, is represented by the n-diraensional Gaussian 
distribution with mean A. and covariance matrix M,, Let the proportion 

jt ^ H' t 

of object class i be X and let A be the vector (A , A ,.,,A ';i , where the 
superscript t denotes transpose, The signature of the mixture with proportion 


[10] Odell, P,L., J.P. Basu, & W. Coberly, 1974, Concerning Several 
Methods for Estimating Crop Acreage Using Remote Sensing Data, 
Progress Report Jime 1, 1974-August 31, 1974, The University 
Of Texas at Dallas. 
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vector is taken to be a Gaussian, distribution, with mean and covariance 
matrix given by 

\ ~ I 

M I \\ 

til 

where A is the matrix with i column A^. These formulas constitute our 
model for signatures of mixtures of materials in terms of signatures of 
the individual materials. 

3,2 ERIM CORRELATION ASSUMPTION 

Examination of the derivation of the model given in Referenda [2], 
section 2.1, reveals that it is assumed that the correlation is zero 
between random variables associated with signals from nonoverlapping small 
areas in a pixel. Critics have pointed to this as being a serious flaw, 

R, Crane of ERIM suggested an experiment to test the extent of the validity 
of the ERIM correlation assumption. The general idea is as follows. From 
Aircraft data, select a number of fields containing the same crop type. 

Use field center pixels only and assume that the correlation function of 
the signals from the pixels depends only on the distance between the 
pixels. Estimate the correlation function for selected channels of data. 

If we find that the correlation distance is small relative to the size of 
a LANDSAT size pixel, then the ERIM model would be validated to some extent 
for LANDSAT size pixels. Although the details of the experiment appear 
straightforward, there are two complicating factors; between field 
variations and scan angle effects. 

In order to minimize the effect of the first factor, an estimate of 
the correlation function is made for each field separately and then an 
average taken over all fields. In order to reduce the effect of the second 
factor, estimates do not utilize pairs of observations along lines of data, 
only between lines. Also, a sample mean and variance is used for each 
angle in a field. Details of the estimation procedure are contained in 
Appendix A, 


[2] Nalepka, R,F., H,M, EoCT<ritz, and F.D. Hyde, 1972, Estimating 

Proportions of Objects From Multispectral Data, Report No. 31560-73-T 
Willow Run Laboratories of the Institute of Science and Technology, 
The University of Michigan, Ann Arbor. 
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The correlation assumption of the ERIM mixtures model was tested 
accordingly. The data used was from segment 203 of the Corn Blight Watch 
Experiment gathered by aircraft at 5000 ft. over Indiana on 13 August, 1973. 
Seven large fields were chosen at random for the correlation test. For 
each field and each of four channels,* correlations were computed for 
distances of up to 47 aircraft pixels or slightly lees than three LAWDSAT 
satellite pixels. The average correlation per channel for all the fields 
was calculated and plotted. 

Figure 1 shows that each of these plots quickly falls to near zero. 

As separations become large, there are fewer correlation measurements that 
can be made. Thus, at large distances, this correlation test becomes 
statistically unreliable. In channel 4 there is clearly some sinusoidal 
noise superimposed on the signal.** 

Figure 2 shows correlation curves of four individual fields in channel 
1, They appear to be random when compared to the average curve of channel 
1 in Figure 1, The other channels displayed as much or more randomness. 

The results of this test, as displayed in Figure 1, support the validity 
of the correlation assumption in the ERIM model with respect to LANDSAT 
data. The correlation falls to near zero in a distance that is small 
with respect to the size of a LANDSAT resolution element. This closely 
approximates the model’s assumption of no correlation between signals from 
different locations within a LANDSAT pixel. Figure 2 shows that what 
little correlation there is cannot be used as a correction to the mixtures 
model because the correlation function seems to be a random variable on 
a field by field basis. 


*10~channel aircraft data was used for the correlation test. To limit 
the test to a reasonable amount of computation time, only the first 
4 channels were used, It was felt that tour was enough to make the 
correlation test valid, although eventually the longer wavelength 
bands should be checked. 

**The peaks are separated by more than 3 aircraft pixels, which rules 
out row structure as the reason for the sinusoidal pattern. 
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After our experiment had been performed, v;e learned that Coherly 
of NASA/JSC had previously conducted a similar investigation. His pixel 
size was approximately 12 feet. The details of the experiment varied 
from ours in that he used a single large rye field and a slightly different 
estimation procedure. Nevertheless, our results and his were very close. 

Thus we have additional evidency of the validity of the ERIM correlation 
assumption for LANBSAT data. 

3.3 ESTIMATION OF PROPORTIONS (MIXMAP PROCEDURE) 

The model for a mixture signature can be used to estimate the proportion 
vector corresponding to a signal data vector from a multispectral scanner. 

Let y denote the n- dimensional data vector from the' scanner, A maximum 
likelihood estimate of the proportion vector [2] is a value of which 
minimizes 

F(X) = n|M^| + <yj^- A^, W”^(y - A^^> 

subject to the constraints that 
i i 

X =1 and X ^0 for 1 ^ 1 4 m 

Here |m[ denotes the determinant of M, M ^ is its inverse, and u,v denotes 

the inner or dot product of the vectors u and v. 

In general, minimizing FCl) subject to the given constraints is 

[21 

quite difficult. Investigations showed that a good approximation to 
the minimal X could be obtained if a simplifying assumption is made. The 
assumption is that the average of the covariance matrices of the pure 
signatures can be substituted for each M^, By using the simplifying assumption 

{llj Coberly, W.A, , 1973, Serial Correlation of Spectral Measurements, 
NASA Internal Memorandum, JSC, Houston. 

[2] Nalepka, R.F,, H, M, Horwitz, and P,D. Hyde, 1972, Estimating 
Proportions of Objects From Multispectral Data, Report No. 
31650-73'-T, Willow Run Laboratories of the Institute of Science 
and Technology, The University of Michigan, Ann Arbor. 
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and applying a linear transformation which reduces the common covariance 
matrix to the identity, the problem of estimating becomes one of 
minimizing a function G(A.) of the form 

GW = l|y - \ll ^ 

subject to the constraints on X. Now y represents the transformed data 
point, and the mean of the signature associated with the proportion 
vector X after the pure signature means have also been transformed. 

The problem of minimizing G(X) subject to the constraints on X can b 
viewed geometrically. The set of points " AX, where X is a proportion 
vector, is the convex hull of the A. and is called the signature simplex 

1 A. 

The problem is to find a proportion vecter X such that AX is the point 
in the signature simplex closest to the data point y. 

The optimal X will be unique if the signature simplex is non-degenerate, 
i.e,, has positive m - 1 dimensional volume. This is equivalent to the 
(n+1 ) -dimensional vectors A^, 1) being linearly independent. Non-degeneracy 
of the signature simplex implies that the number of materials m in the pure 
signature set does not exceed the number of spectral channels n by more 
than one, 

The problem of miiiiniizing G(X) can be identified as a quadratic programming 
problem. A program adapting the Theil St van de Tanne method for solving this 
type of problem is used to estimate the proportions of object classes within 
a pixel. Details may be found in References [2,4,12], The computer program 


[ 2 ] Nalepka, R.F,, H,M, Horwitz, and P.D, Hyde, 1972, Estimating 
Proportions of Objects From Multispectral Data, Report No. 
31560-73-T, Willow Run Laboratories of the Institute of Science 
and Technology, The University of Michigan, Ann Arbor, 

I 4] Horwitz, H,M, , P,D. Hyde, and W, Richardson, 1974, Improvements in 
Estimating Proportions of Objects From Multispectral Data, 

Report No, 190100-25-T, Environmental Research Institute of 
Michigan, Ann Arbor, 

[12] Kunzi, H.P., W, Krella, and W. Oettli, 1966, Nonlinear Prograimaing, 
Blaisdell Publishing Co., Boston. 
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is called MIJIMAP, and in view of the fact that otherprocedures for estimating 
proportions are introduced in sections 4 and 5, we shall refer to this 
basic algorithm as the MIXMAP procedure. It requires about 20 msec to 
estimate a mixture of 5 materials with 12 channels of data. 

3.3.1 DATA AVERAGING 

In order to reduce computation time, the MIXMAP program has a data 

[41 

averaging option , This option provides for averaging a number of data 
points and then estimating proportions of the target classes in the region 
corresponding to the totality of the data points averaged. This averaging 
procedure reduces computation time by a factor approximately equal to the 
number of data points averaged. It also has theoretical advantages in that 
the estimates of proportions are asympotitically unbiased in an ideal 
situation. However, up to now, results of limited tests on LANDSAT data 
using data averaging have not been impressive. More testing is necessary 
in order to evaluate this procedure more completely. 

3.4 EQUAL COVARIANCE ASSUMPTION 

The substitution of the average covariance matrix for the individual 
covariance matrices of the different object classes has been criticized. 

This assumption was made to facilitate the computation of proportion 
estimates after making simulation runs using typical agricultural signature 
sets to test the validity of this substitution. Results indicated that 
this approximation was reasonable. But the decisive factor in making this 
substitution was the fact that we know of no reasonable numerical 
procedure for obtaining the exact maximum likelihood proportion estimate, 
nor has anyone recommended any appropriate alternative procedure, 

3.5 DETECTION OF ALIEN OBJECTS 

Estimating proportions of unresolved objects from a signal y is based 
on the assumption that the signal comes from a pixel which contains a micture 
of materials. These materials are represented by known signatures that 
constitute the pure signature set. If the pixel should contain a material 
not represented in the signature set, significant additional error in the 
estimate of proportions may result. The amount of this error depends upon 

[4] Horwits, H.M., P,D, Hyde, and U. Richardson, 1974, Improvements in 
Estimating Proportions of Objects From Multispectral Data, Report 

No, 190100-25-T, Environmental Research Institute of Michican 
Ann Arbor . * 
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the proportion of these alien materials and the geometric relationship 
of their signatures t>' those in the pure signature set. Those materials 
occuring in a scene but not represented in the pure signature set are 
referred to as alien materials or alien objects. Procedures have been 
designed to reduce the error resulting from the presence of alien objects. 
These procedures take the form of thresholding tests — hence the designation 
•'alien object threshold,” 

One might attempt to avoid the alien object problem by obtaining 
signatures for all materials present in the scene. This approach is usually 
impractical because of the large number of materials present and the 
impossibility of obtaining definitive signatures for many of them. An 
alternative is to use essentiall a chi-square test as in conventional 
recognition processing. 

The new mixtures program contains improved procedures for dealing 
with alien objects. These procedures can be described most easily in terms 
of the pure signature set and signals after a linear transformation has 
been employed. After this transformation, we assume that the i-th material 
in the pure signature set has mean A^, and its covariance matrix is the 
identity. Now given a signal (data point) y from a pixel with unknown 
proportions of various materials, the estimate \ of the proportion is 
obtained as follows, Let Z denote the point in the signature simplex 
closest to y. Then Z may be represented in the form 
Z = AX 

where X is a proportion vector and is taken as the estimate of proportions 
in the pixel represented by the signal y. In order to apply an alien object 
test, we ask, "What is the probability that we would have observed the signal 

A 

with value exceeding y if the true proportion of the pixel was X?" Assu m i n g 

Gaussian signature distributions, this amounts to a chi-square test with n 

degrees of freedom, where n is the number of spectral channels used. The 

2 

level of significants is determined by a value which is the alien 
object threshold, If 

! |y - Z| = I |y - AXl 1^> Xq 

then the estimate falls the chi-square test; we then say that the pixel 
contains significant amounts of alien materials and make no estimate of 
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proportions for the pixel in question. If the estimate passes the test, 
we accept it as the estimate of proportions of materials in the pixel 
in question. 

3.6 SIGMIURE ANALYSIS 

The quality of the estimates of proportions one can expect can be 
determined to a large extent by examining the pure signature set. In 
conventional recognition processing we know that the quality of results 
depends upon the distances between pairs of signature means relative to 
their spreads (covariances). When these distances are large, good results 
can be expected* Not only is this requirement necessary for good proportion 
estimates, but a more stringent condition must be satisfied; that no 
pure signature be close in a probability sense to any signature of a mixture 
of the other materials. 

A feature of the MIXNAP program is a simple test called geometric 
signature analysis (GEOM) , Ue deal with the transformed signature simplex 
with vertices A^, 1 ^ i ^ m, and assume that the common covariance matrix 
of all the transformed signatures is the identity. Let r^ be the distance 
of A^ to the closest point in the hyperplane through the face of the 
signature simplex opposite A^. The face opposite A^ is the convex hull 
of all the vertices A. except for A.. Then r. measures this distance, in 

j X 1 

standard deviation units of A-, to the mean of a mixture of the other 

1 

materials in the signature set. If some r^ is small, we would expect data 
points representing some A^’s to be confused with data points representing 
mixtures of the other materials. Figure 3 illustrates a signature simplex 
well-conditioned for proportion estimation. The circles at the vertices 
indicate the spread of the distributions at the vertices; these circles 
were formed by points which are one standard deviation away from the vertex. 
Each vertex is several standard units away from the vertex. Each vertex 
is several standard units away from the closest point in the opposite 
hyperplane. Figure 4, on the otherhand, shows an example of an ill- 
conditioned signature simplex. The pure signature mean A^ is less than a 
standard deviation away from the closest point in the opposite hyperplane. 
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FIGURE 3. WELL-CONDITIONED SIGNATtHlE SIMPLEX 



CHANNEL 1 


FIGURE 4. ILL-CONDITIONED SIGNATURE SIMPLEX 
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3.7 EXTENSIONS OF BASIC PROPORTION ESTIMATION PROCEDURE 

In order to improve performance, the basic ERIM proportion estimation 
procedure (MIXMAP) has been extended in two directions. One of these 
extensions results from using prior information about the probable content 
of pixels. Normally, a majority of pixels are pure (contain a single 
material). When a mixture pixel occurs, it generally contains a small 
number of component materials; say 2, 3, or 4. The LIMMIX procedure, 
described in Section 4, incorporates this kind of prior information. 

The other extension results from utilising spatial information in 
order to restrict further the combinations of object classes which can 
occur sirouLtaneously id.thin a single pixel. The spatial information 
employed r.onsists of the signals from adjoining pixels. The resulting 
procedure is referred to as nine-point mixtures and is treated in Section 
5. It will become clear that nine-point mixtures may be considered an 
extension of the LIMMIX concept, 

Both the LIMMIX and nine-point mixtures procedures have a very Important 
advantage over MIXMAP, especially when the number of spectral channels of 
information is relatively small as in LANDSAT data. It has been pointed 
out in Section 3.3 that a necessary requirement for the suitability of 
MDilMAP processing is that the size m of the signature set and the number 
n of spectral channels be such that 

m ^ n -b 1 

Thus, for example, the maximum size of the signature set permissible for 
MIXMAP processing of LANDSAT data is 5. 

The corresponding restriction for LIMMIX and nine-point mixtures 
processing is much milder although more complicated. Let L denote the 
maxi mum number of object classes which are assumed to occur simultaneously 
in single pixel. Then a necessary condition for the suitability of 
LIMMIX or nine-point mixtures processing may be expressed by the following 
two inequalities ; 

L ^ n + 1 when L = m 

and 

L ^ n when L m 

Thus for LANDSAT data any size signature set will satisfy this condition 
as long as L does not exceed 4. 
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4 

UTILIZATION OF PRIOR INFORMATION IN ESTIMATING PROPORTIONS 


The experience gained at ERIK with estimating proportions of unresolved 
objects has led to a number of modifications of the mixtures algorithm. 

Many of these modifications are similar in that they place limitations on the 
combinations of object classes which are assumed to occur in a single pixel. 
Methods for implementing such limitations appear to be of two types. The 
first type depends on spectral characteristics only, while the second type 
depends on both spectral and spatial characteristics. The LIMMIX procedure, 
described in this section is of the first type; while nine-point mixtures, 
presented in Section 5, is of the second type. 

Techniques which support the LIMMIX procedure are also described in 
this section. In addition, results of preliminary tests are presented. 

4 . 1 LIMMIX PROGRAM 

We have found that the number of object classes which occur simultaneously 

in a single pixel is very limited. LIMMIX exploits this fact. It assumes 

that no pixel contains more than L, L ^ 4 (L is a parameter), object 

classes simultaneously. In order to facilitate testing and evaluation, 

the LIMMIX program produces a tap.* output for further processing. This 

tape will now be described. Figure 5 is a record of the tape generated 

for each data point assuming the parameter L was taken to be four. The 

first four positions give the results for the maximum likelihood single class. 

Here \ = 1 because the pixel is all class C. . Then the likelihood value (a..) 

^ 2 

of the data point is stored along with the chi-squared value d^ ), 

The next five entries record the best two at a time choice for the data 
point. The two X's are the proportions of the two materials found best and 

codes the particular pair chosen. a_ is the likelihood of the data point with 

^ 2 
respect to the signature of this best mixture of two objects classes, and 6-2 

is the chi-squared value of the data point with respect to the signature of 

this pair. Similarly the next six entries on the tape record are the best mixture 

of a combination of three at a time, and the last seven entries record the best 

mixture of a combination of 4 object classes. Best is used in the sense of 

maximum likelihood. 
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X = Proportion 

to Cj = Combination 

= Likelihood 

2 

dj = Chi -Squared Distance 

FIGURE 5 . ONE RECORD OF THE LIMMIX OUTPUT TAPE 
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LIMMIX uses the MIXMAP procedure for determining the best mixture of K 
classes at a time. For example, to find the best three at a time mixture, all 
subsets of three classes are considered. For each of these subsets, the 
best mixture of the three classes is obtained via MIXMA? along with the 
likelihood of the mixture. It is that mixture of three classes yielding 
maximum likelihood value for the data point that finally appears on the output 
tape. 


In order to obtain results from the LIMMIX tape, further processing must 

occur. Ihe present processing approach is summarized below. Say the parameter 

value L is three. Then we choose chree threshold values 
2 2 2 
. X2 . and ^ 


If 



then the pixel is all class C^. If 


2 2 2 2 
d^^ > Xi and d^^ < X2 


then the pixel is taken to contain the mixture associated with the pair 
on the LIMMIX tape. If 

2 2 2 2 2 2 
^ V’ h ^ 'I3 ^ ’(s 

then the pixel is said to contain the mixture associated with the combination 
on the LIMMIX tape. If 

2 2 2 2 2 2 
^1 ^ ^1 ’ ^2 ^ ^2 H ^ h 

then the pixel is taken to contain alien (unknown) materials. Further details 
of LIMMIX are contained in Appendix B. 
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4.2 ALIEK2 

A computer program, ALIEN2, was developed to operate with LIMMIX to 
facilitate experimentation. The current version of LIMMIX, as described above, 
puts all of the calculated results on an output tape, without deciding which 
k-signatures-at-a-tirae winner to accept as an overall winner. ALIEN2 then uses 
this output tape 'as input, and permit's a wide range of decision rules (x^ 
parameters). In effect, LIMMIX is run many times, using only one output tape per 
scene. ALIEN2 also tabulates the results for each parameter setting, making it 
relatively easy to evaluate the working parameters of LIMMIX. 

2 

In a production set-up (i.e., when it is known how to set the x^ parameters) 

the two programs will be combined, with no Intermediate tape generated. Since 

most of the pixels in a scene are pure, it will not always be necessary to calculate 

the most likely pair triple, etc., of signatures. For instance, if the chi-square 

distance from the most likely signature to the pixel is within the limit set by 
2 

the Xi parameter, the algorithm will call this signature the solution, and go 

2 

on to the next pixel. If the chi-square distance is greater than x^ > ® search 

will be made to find the most likely signature pair whose distance is less than 
2 

the X2 parameter. This process will continue until the pixel is either 
designated as some combination or is checked as alien. Details of ALIEN2 are 
in Appendix C. 

4.3 GEOMETRICAL SIGNATURE ANALYSIS 

A prime factor affecting the performance of LDIMIX is the geometrical 
configuration formed by the signatures of the object classes occurring in the 
scene. In the previous ERIM mixtures approach implemented by the program 
MIXMAP, geometrical signature analysis (program GEOM) is normally performed on 
the signature sets to determine its adequacy for MIXMAP processing. GEOM supplies 
measures of how close (in a probability sense) each signature mean is to a point 
in the hyperplane through the other signature means of the signature simplex. 

The larger these distances are, the more non-degenerate is the signature 
simplex in a probabilistic sense; and the more suitable is the scene for MIXMAP 
processing. t'Jhen the number of signatures m in the signature set and the 
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number n of channels of data are such that 

m > n +1 

it follows that at least one of these distances is zero, which means that 
the signature simplex is degenerate, and the associated scene is unsuitable 
for MIXMAP processing because the maximum likelihood proportions estimate is 
then ambiguous. Thus a necessary requirement for the appropriateness of MIXMAP 
processing is that 

m ^ n + 1 

This requirement can be a severe limitation, especially when the number of 
channels of information is relatively small as in LAIJDSAT data. The corresponding 
conditions for LIMMIX processing which limit the values of parameter L may be 
stated as follows: 

A necessary condition for the suitability of LIMMIX processing is that 
every subset of L + 1 or less signature means form a nonciegenerate simplex. 

When L = m, the limitation is L ;< n + 1. When L m, the limitation is 

L ^ n 

Thus, in theory, we can use LIMMIX processing with L=4 on LAMDSAT four-channel 
data with any size signature set. Figure 6 illustrates an example of 6 signature 
means and 2 channel data. Any subset of 4 or more of these signature means forms a 
degenerate simplex, but any subset of 3 or less forms a nondegenerate simplex; 
therefore, the data associated with this signature set might be suitable for LIMMIX 
processing with parameter value L=2. To obtain a more quantitative 
measure of suitability of a signature set for LIMMIX processing, geometrical 
signature analysis is performed on each subset of L+1 signature means. The 
requirement for suitability is that each of the L+1 distances obtained for 
each of the 3 ^) i^^-L-l) 1 ®^^sets be adequately large. 

The distances obtained for the geometrical signature analysis for 
LIMMIX processing (GE0M2) will now he defined more precisely. To avoid 
notational complexity we will assume that a specific subset of L+1 signatures 
has been chosen and relabeled, if necessary, so that their means are denoted 
hy A^, A^ 

denote the hyperplane of dimension L^l though the means ^2’**^+l 


9 « • « 5 


Aj^^^and covariance matrices by M^, M 2 . . . . 


Let H, 
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Z be the point in which maximizes the Gaussian density with parameters 
Then is defined by 

d^ = (Z-A3^)> 


Figure 7 is an illustration for the case L=2 and n=2. In this example, 
d^ is approximately 3. 

There is an interpretation of the distances d^ associated with a 
simplex that may be helpful. It is understood most easily when the 
covariance matrices are all equal and the usual transformation to the 
identity is utilized. Then the radius r of the largest inscribed sphere 
is given by 


1 

r 


L+1 

E 

i=l 


1 

d. 

1 


Then r may be taken as a summary measure of the suitability of the simplex. 

When covariance matrices are not equal, then r as given by this formula, 
although lacking a simple geometric Interpretation, appears to have merit. 

*fl31 

Table 1 displays the output from GE0M2 with respect to a CITARS ^ ^ data 

set. This data was gathered 21 August 1973 over Fayette County, Illinois. The 
target crops were corn and soybeans. The data was used in tests reported in 
Section 5.2. The signature set contained six classes and the limit L was taken 
to be two. Thus all possible combinations of three materials required examination. 
Since there are 20 of these combinations, there are 20 rows in the table. In 
the first row, first column, for example, 1.7 is the closest distance (measured 
in standard deviation units of the corn signature) of the mean of the corn signature 
to the line through the means of soy and trees. In the second column of the 
first row the entry 2,3 is the closest distance (measured in standard deviation 
units of the soy signature) of the mean of the soy signature to the line through 


*CITARS was a joint research task for ^rop Identification Technology ^sessraent for 
Remote Sensing. 

[13] Malila, W.A. , D.P. Rice, and R.C. Cicone, 1975, Final Report on the Citars 

Effort by the Environmental Research Institute of Michigan, Report 109600-12-F, 
ERIM, Ann Arbor, Michigan, 


33 




TABLE 1- DISTANCES CALCULATED BY GE0M2, Fayette Co., 
12 August 1973 C^^nits of Standard Deviation) 


CORN 

SOY 

TREE 

BARE 

CLOVER 

WEED 

1.7 

2.3 

9.3 




2.7 

2.2 


3.6 



X.6 

2.4 



4.4 

. 

1.9 

2.2 




9,9 

X.l 


2.6 

5.0 



2.5 


9.2 


2.9 


1.6 

• 

3.5 



10.8 

2.2 



1.5 

5.2' 


3.4 



1.0 


1.8 

2.5 




5.1 

2.4 


2.7 

9.1 

3.8 




3.7 

7.7 


2.8 



2.9 

9.4 



9.8 


3.3 


1.5 

4.1 



5.9 


0.7 


3.5 


3.4 



4.1 

14.9 



3.4 

1.5 

2.7 




18.9 

1.1 


1.0 



3.0 


2.7 

12.1 




1.5 

11.5 

i.$ 
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the means of com and trees. Overall, the distances are fairly large, although 
the 0.7 for bare soil versus soy and weeds may indicate possible difficulties. 

4.3 CLUSTERING PROGRAM 

We have found that poor multispectral data processing results are often 
due to signatures which are not representative of ground class distributions. 

This in turn may stem from two sources: (1) an insufficient number of data 

points to obtain a good estimate, and (2) the incorrect determination of the 
number of modes of the distribution. 

The large error that may be introduced in this manner often makes it 
difficult to evaluate the efficacy of a classification procedure. Clustering 
algorithms offer hope of a solution. These algorithms may be loosely defined as 
algorithms which identify data points which are ’alike'. Because this project 
has been hampered by the errors arising from this problem, suitable algorithms 
were developed. 

4.4.1. Description 

To provide versatility, three different algorithms were incorporated into 
the program. 

Algorithm one uses small, normal distributions to approximate the cumulative 
distribution function of the ground classes in a scene. Then it combines these 
elements, on the basis of high probability of mis classification, to form signatures. 
A description of this follows. 

2 2 

(1) Suppose we have m cells F, . . .F , with mean A., variances (a. ) 

Im X xixn 

l^i^m, where n is number of channels. Let K. denote the number of samples within 

|.-L. ^ 

the i'' cell. Given a new sample X, calculate the distance of X from each cell 
center by 

n 

d(X,A^) = ^^j”^xj (1=1, •• .m) 

Find r such that d(X,A ) = MIN. d(X,A.), 1 < i < m 
Then X is classified as one of the following. 
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If 


d(X,A^)< T then X assigned to 


If 


d(X,A )> 6 then X creates a new cell r 

^ otherwise X is stored. 


til 

(2) When a new sample is classified to the i cell, this cell’s 


parameters are adjusted as follows: 


(a) increase the number of samples (K^) by one 


(b) calculate a rew mean vector (A^) 


I;: 


A. = ^ 

3. Kj 


S 


X. 


all 35. in cell T. 

X X 


(c) determine new variances by 

a? - MAXCo? .(0),sj ) 


where 




K. 

1 


E 

£=1 




f ■; 

i’i 


'th 2 f 

where the X„ are classified to the i cell and o,,(0) is an initial 


i ■> 


2 2 2 2 
assignment of , Only when exceeds (0) do we replace (0) 




with S, 


3-J 
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(3) The first sample always creates a new cell. The second sample is 
tested and classified by (1) and so on. When all samples have been classified, 
the stored samples are forced into the nearest cells according to (1). Each 
cluster is then tested against every other cluster for a high probability of 
misclassification. Whenever two clusters are found to have a high probability 
of misclassification, they are combined with a weighting based on the number of 
points in the clusters. This process is iterated until one cluster has more 
than a certain percentage of the points, or the largest several clusters have 
more than some other percentage of the points. The measure of probability 
of misclassification used is: 

p=(P(W^) PCW^))^^^ exp(-(A2-A^)'^((Mj^+il2)/2)"^(A2-A^)) 

and W2 are the two classes involved. The A and M symbols stand for the 
mean vectors and cova'^'iance matrices of the two clusters. 

Algorithm two is almost identical to algorithm one, except that it is a 
supervised algorithm, i.e., each data point is labeled (by crop class) and 
algorithm one is carried out separately for each class. 

Algorithm three is an unsupervised, iterative algorithm which estimates 

the means and variances of ground class distributions. It is, in part, similar 

[141 

to NSPACE, developed by Eigen and Northouse at the University of Wisconsin 
Algorithm three proceeds as follows. First, the user inputs his initial guess 
of starting means and -oar^'ances, or allows the program to spread starting means 
evenly throughout the data space, with a common starting variance. Data points 
are then classified to these means using either the standard metric or the 
linear Bayes decision rule. The estimates of each mean and variance may be 
updated every time a data point is classified to that mean, or after each scan 
line or region. The new means and variances are used for further classification. 
This process is repeated until the estimates of the means and variances change 
very little from iteration to iteration. Further details are contained in 
Appendix E. 


114] Eigen, D.J., a. ’ R.A, Northouse, 1972, N Space — ^An Unsupervised Clustering 

Algorithm Based on Discretized Marginal Distributions, Report No. TR-AI-72-3, 
The University of Wisconsin, Milwaukee. 
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4.4.2 EFFECTIVENESS 

It was found that these algorithms, especially one and two, produce highly 
accurate signatures. They have been useful in analyzing variations in the data, 
multi-modality, and identifying troublesome ’other’ classes. The use of these 
algorithms has reduced the error stemming from poor correspondence between 
signatures and ground class distributions. This has resulted in better evaluation 
of classification schemes. 

4.5 PROBLEM OF ESTABLISHING LIMMIX PARAMETERS 

The effectiveness of the LIMMIX procedure is dependent on setting the 

parameters properly. As an algorithm becomes more sophisticated, it is usually 

more difficult to set the parameters, because there are more of them. Such is 

the case with LIMMIX. Even when pixels are limited to mixtures of two signatures, 

2 2 

there are three parameters to set. They are > X 2 > proportion 

threshold. There is also the option of renormalizing the remaining proportions 

2 

after thresholding. In MIXMAP there were only the t and one x parameter (the 
alien object threshold) to set. The only known method for establishing parameters 
is to run the algorithm on training data. A wide variety of parameter combinations 
are used. The parameter set giving the closest estimate of the training area 
ground truth is then used on the test area. It is also difficult to set the 
parameters in the nine-point algorithm as explained in Section 5. In Section 4.6, 
tests are made on LANDSAT data in order to devise techniques for establishing 
parameters . 

4.6 PRELIMINARY TESTS 

Two data sets were chosen for preliminary testing of the LIMMIX algorithm: 

CD A water data set consisting of 20 generally small lakes and ponds in an 
eight square mile area near Lansing, Michigan, and (2) a fourteen section 
agricultural data set from Hill County, Montana. 

The first data set was chosen for an initial test because water is a 
relatively high contrast target. Also, other algorithms had already been tried 
on the water data. This provided a basis for comparing the results that LIMMIX 
generated. 
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The Hill County Data was selected as the agricultural test of LIMMIX. 
for two reasons. The area’s main crop is wheat, the target crop of the soon to 
be Implemented LACIE*^^^^ project, and the area concains many narrow fields. 

The latter insures that there will he numerous mixture pixels to exercise the 
LIMMIX algorithm. 

4.6.1 WATER DETECTION 

[31 

A water detection project previously done with MIXMAP was 
redone using LIMMIX. As before, the data set was divided into water and 
non-water regions. The detection rate is defined as the area of water 
found in the water region as compared to the area hncwn from ground truth. 

The false alarm rate is the area of water detected in the non-water region 
divided by the area of that region. When the detection rate is plotted 
against the false alarm rate, we obtained the so-called operating curves 
of the algorithm. 

Figures 8, 9, and 10 shows the operating curves for MIXMAP 
and LIMMIX. These curves represent the best performance of each algorithm, 
and will be compared as such. 

The ilIXMAP graph (Figure 8) is for various rejection probabilities 
and thresholds (water only). The thresholding, needed to cut down the 
numerous false alarms, gives the best operating curves. 

LIMMDi, on the other hand, thresholds all materials. Thresholding 

all of the signatures will reduce the detections and false alarms. False 

alarms are not as large a problem with LIMMIX due to the recognition portion 

of ‘vhe algorithm. The renormalization process, whxi.h increases the detections, 

is therefore the preferred operating mode. The ops.~^i,ing curves of LIMMIX 

2 2 

for vario^■s combinations of and X 2 values are presented In Figure 9. 


*LACIE is a joint project for a ^arge Area £rop In/entory ^cperiment. LACIE 

results will contribute to a future operational system for worldwide crop 
inventory using remote sensing and computer technology. 
fl5] Large Area Crop Inventory Project Plan, November 1^ 1974, KASA-NOAA-USDA 
Report No. LAP 01, NASA/ JSC, Houston, Texas 
[3] Nalepka, R.F. , and P.D. Hyde,' 1973, Estimating Crop Acreage From Space- 
Simulated Multispectral Scanner Data, Report No. 31650-14S-T, ERIM, 

Ann Arbor, Michigan. 
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FIGURE 8. OPERATING CHARACTERISTICS OF PROPORTION ESTIMATION (3 CHANNEL) 

(MIXMAP) 
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A much higher detection rate with a smaller false alarm rate is 
evident in the LltMIX operating curves. The LIMMIX curves show that it 
is possible to detect 100% of the water while having only about 0.5% false 
alarms. MIXMAP is only able to detect about 93% of the water for the same 
rate of false alarms. Figure 10 was Included to show that even the 
operating curves for LIMMIX without thresholding are the equal of those 
for MIXMAP at its best, 

4.6.2 ESTIMATING THE PROPORTION OF WHEAT 

LIMMIX was tried with wheat as the target crop. The data set selected 
was from Hill County, Montana. Its long, narrow fields create many mixture 
pixels making recognition difficult (Figure 11) . The purpose of the experiment 
was not to train parameters to be used on test data, but rather to see if LIMMIX 
had the capability of achieving good and consistent results. 

The data consisted of several different LANDSAT passes over Hill County. 

On the basis of previous unpublished results generated by NASA/ JSC/Earth 
Observation Division personnel, the July 16 pass was selected for 
processing. Unfortunately, the data tapes were unlabeled, and thus a 
considerable amount of effort was required to discover which data set corresponded 
most closely to known characteristics of the July 16 data (these characteristics 
were mean signal levels of various crops in two channels) . 

When the July 16 data set was identified the conventional process 
for identifying field location was carried out, i.e., various features 
were identified on a line-printer map of one channel, and then a regression 
fit was performed to determine the coordinate transformation from an aerial 
photograph to the data set. Signatures for the data set were then obtained. 

It would be difficult to get representative signatures from such 
narrow fields by conventional methods since many of them are less than 1 
pixel wide. For this reason it was decided to use a clustering algorithm 
to obtain the signatures. The equivalent of 5.5 sections was clustered 
(farms N^l, 2, 3, 5, 6, 7, 8, 14, 15, 16, 17) and 13 signatures were 
obtained. To show that they were indeed different, program EPLOT was run. 

The program plotted the mean and covariance matrix for each signature for 
3 pairs of channels (2 vs 1, 3 vs 2, 1 vs 4). The plots were examined 
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FIGURE 14. HILL COUNTY SIGNATURES (CHANNEL 1 VS CHANNEL 4) 
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and the signatures were found to be distinct. For the 78 combinations 
of signature pairs, none of the covariance plots overlapped on all three 
graphs, and only 9 pairs overlapped on two graphs. (Figures 12, 13, 14). 

The next task was to correlate the signatures to crop types known to 
be in the scene. Recognition processing was run on Hill County for the 
13 signatures. The ground truth map and a recognition map were used to 
identify the clustered signatures. Three of the signatures were found 
to be wheat. (Numbers 2, 6, 11). LIMMIX, was then used to classify the Hill 
County area. 

Mixtures of no more than two materials were used in processing Hill 
County. That two signatures is the maximum which can be used can be clearly 
seen in Section 3.7. The reason for this is perhaps less clear, considering 
that MIXMAP is capable of using one more signature than the number of channels 
of data. Here is an explanation by example: For 2 channel data and a signature 

set of three members, LIMMIX and MIXMAP can both consider mixtures of at most 3 
object classes in a single pixel. When the set has four members, MIXMAP 
breaks down completely, since it must consider mixtures of four, and there can be 
many ambiguities. LIMMIX, of course, cannot calculate the best four at a time 
either, again because of the ambiguities; but it can find the best one and two 
at a time. The 3 at a time is a special case where there is usually just one 
ambiguity. Figure 15 shows four signatures in two channels. The data point 
(x) could represent a combination of signatures 1, 3, and 4 or 1, 2, and 3, since 
the likelihood for either is the same. It is for this ambiguity that three 
at a time must be discarded for LIMMIX. 

The criterion chosen for determining classification accuracy was the 
percentage of each material found in a relatively large area as compared to the 
true percentage of each material in that area. This was because the normal 
method of determining classification accuracy (testing field center pixels) is 
inappropriate for the LIMMIX. algorithm, since much of its value lies in its 
potential to deal with mixture pixels. 

LIMMIX was run on Hill County data using the 13 signatures for combinations 
up to two at a time. To save processing time, only 6 sections (N-1-8, 12, 13) 
were chosen for further analysis. 
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2 

Program ALIEK2 was run on the LIMMIX rape for a variety of Xi 

2 

Xg values. The output, in number of pixels detected for each signature 

and each pair of Chi-Squares, was compared with the ground truth to 

obtain the detection rate for wheat over the 6 sections. 

Due to the small field sizes, it was not possible to define non-wheat 

areas and therefore to record false alarms by use of the ALIEN2 program, 

and consequently the usual operating curves (i.e., detection rate vs false 

alarm rate) are replaced by a graph of detection rate vs. the chi-square 

values. Results are presented in Figure 16. Since the graph does cross 

100% detection (including false alarms), it was decided to use these 

parameter values in two subset areas to test their universality. Small 

areas 1 and 2 are defined In Figure 11. The results for the 2 smaller 

areas are presented in Figure 17 and 18, These figures are the 

same general shape as Figure 16 but are shifted along the detection rate 

2 2 

axis. Parameter settings of “1 X 2 where the detection rate is 
100% for the 6 section area would give detection rates of 92% and 114% 
for small areas 1 and 2. Even though we did not use separate test and training 
regions, this preliminary experiment indicates that there may be parameters 
settings which are approximately correct over subregions. 
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UTILIZATION OF SPATIAL INFOEMATION IN ESTIMATING PROPORTIONS 

Many current multispectral data processing schetnes classify pixels on 
the basis of their associated signals; the signals from neighboring pixels do not 
influence the outcome. But for many applications, schemes which take neighboring 
data into account would be expected to perform better than these single element 
rules. In addition, such schemes should make the distinction between pure and 
mixture pixels better than a single element scheme. 

Nine element rules are designed to gain these advantages while preserving 
simplicity and speed. Such rules are applied in turn to each pixel of the scene 
in the context of its eight immediate neighbors arranged in a 3 x 3 grid as 
diagrammed in Figure 19. These rules assume that when most of these nine pixels 
are assigned the same classification on a preliminary recognition pass, then the 
center pixel is unequivocally this material. When there is no clear consensus 
amont these nine pixels, the center pixel may then be a mixture. Modest storage 
requirements and the small number of pixels playing a role in each decision make 
these rules practical, 

[7] 

After a study of investigations of nine-point rules by Richardson , the 
voting rule was selected as the one most likely to detect boundary pixels. 

The voting rule is applied after a preliminary recognition pass has been 
made on the nine pixels. The center pixel is assigned the material recognized 
most frequently among the nine if N^ or more pixels of the nine have been 
recognized as that material (N^^ is a parameter of the procedure). If no material 
gets at least N^ votes, than the center pixel may be either a pure pixel or a 
mixture pixel. 

The advantage of the voting rule in proportion estimation is that a large 
nvimber of pixels contain a single material, and this rule detects most of them. 
For these pixels, the procedure terminates after the vote. For the remainder 
of the pixels, the procedure terminates after the vote. For the remainder of 


[7] Richardson, W. , 1974, A Study of Some Nine-Element Decision Rules, 

Report No. 190100-3 2-T, Environmental Research Institute of Michigan, 
Ann Arbor, Michigan. 
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the pixels, the voting rule provides contextual information which may be used 
to determine which materials are present in a mixture. 

5.1 HIHE-POIMT MIXTURES PROCEDURE 

The voting rule was combined with the LIMMIX processing scheme* Three 
algorithms were developed for testing. These are described below. Additional 
details are contained in Appendix F. 

5.1.1 ALGORITHM 1 

A. Make a preliminary pass through the data, classifying each pixel 
according to the quadratic Bayes decision rule. 

B . For each pixel, look at it and the adj oining eight pixels , and take a 

’vote' as to their identity (pixels may participate in the vote only if their 

2 

associated Chi-Squared level is less than 71^^) . If at least the pixels 

agree as to identity, the center pixel is classified as this material. 

C. If less than of the pixels agree as to identity, examine the Chi- 

Squared level of the center pixel's classification. If this Chi-Squared level 
2 

is less than n«, accept the recognition. 

^ 2 

D. If the Chi-Squared level of the center pixel is greater than find 

the two largest vote winners in the vote of (B) . Call the pixel a mixture of 
these two materials, i.e,, if 4 pixels 'voted' for corn, 3 pixels 'voted' for 
wheat, and 2 pixels 'voted' for soy, call the center pixel 4/7 corn and 3/7 wheat. 
5.1.2. ALGORITHM 2. 

This is the same as Algorithm 1 except for step D, which becomes: 

2 

D. If the Chi-Squared level of the center pixel is greater than T] 2 > find 
the best two-at-a-time mixture via the LIMMIX procedure. 

5.1.3 ALGORITHM 3. 

This is the same as Algorithm 1 except for D, which becomes: 

2 

D. If the Chi-Squared level of the center pixel is greater than ^ 2 * 
if the totals of the two largest vote winners in the vote of (B) are greater 
han or equal to M 2 * the pixel is assumed to be a mixture of these: two materials. . . 

Find their proportions via the LIMMIX procedure (The signature set contains only 
these two materials) , If the totals of at least one of the two largest vote 
winners is less than K 2 » find the best two-at-^a-time mixture via the LIMMIX procedure 
(all signatures are included in the signature set). 
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5.2 TEST RESULTS 


In order to determine which, of the three algorithms performed the best, and 
to determine proper parameter settings for each, these algorithms were tested 
on three types of data sets: (1) a water data set from an eight square mile area 

near Lansing, Michigan, consisting of 20 small lakes and ponds, which ranged in 
size from seventy acres to one- third of an acre, averaging about 10 acres 
(Section 5.2,1) . (2) An agricultural data set, gathered 21 August 1973, with 

target crops of corn and soybeans (one of the CITARS data sets), with training 
and test data taken from a 5 x 20 mile area in Eayette County, Illinois (Section 
5.2.3). (3) Two agricultural data sets with wheat as the target crop. The first 

was a 14-section data set from Hill County, Montana with 6 of the sections taken 
to be test data (Section 5.2.2). The second was a CITARS data set, gathered 
10 June 1973, with training and test data taken from a 5 x 20 mile area in Fayette 
County, Illinois (Section 5.2,4). 

Preliminary testing was done on the water data set and on the Hill County 
data set. These preliminary test results showed that the performance of 
algorithm one was markedly inferior to that of conventional recognition, and 
it was discarded. Algorithms two and three were found to perform approximately 
the same in all cases, although algorithm three is preferable because of shorter 
processing time. Consequently, only algorithm three was tested further, and 
it will be referred to as ' the nine-point mixtures algorithm' . 


2 2 

Examination of the four parameters, and ri 2 » in algorithms 

two and three showed that the best values of both and M 2 were invariant over 


was found to be optimum at eight, and severely 


the data sets studied, 
degraded performance resulted from any other setting 


The optimal value of H,. 


was found to be. four, 

2 2 

The best settings of Ti-, and ri„ vary from data set to data set, much as the 

■■ ■■ n 

parameters and LIMMIX, And as in LIMMIX, training is the only method 

we now have for selecting parameter settings. 


5 . 2 . 1 ^ MATER DETECTION 


The water data was the first set used for testing of nine-point mixtures . 
Because conventional recognition and LIMMIX processing results were already 


DETECTION MTE (%) 













FORMEHLV WILLOW RUN LABORATOniES. THE .UNtVCRSrry OF MiCHIQAN 


available for this data set, a direct eomparlson was possible. Figure 20 shows 
a comparison of results for a range parameter settings for LIMMIX and nine-point 
mixtures processing, and the best parameter setting for conventional recognition 
processing. 

In the nine-point mixtures processing on the water data, it was found that 

2 

the results were quite sensitive to changes in r)„, while the results were almost 

2 • 
invariant for any greater than 25, 

It can be seen from the figure that both LIMMIX and ^^ine-point mixtures 
performed better than conventional recognition. It is noteworthy that for; 
nine-point mixtures when the detection rate was 100%, the false alarm rate was 
only about 0.8%. In addition, nine-point mixtures was quite accurate even on 
a lake by lake basis . 

In this test only three signatures were used, and we found that the speed 
of processing with nine-point mixtures was approximately that of conventional 
recognition. As the number of signatures increases, processing time of nine-point 
mixtures increases more rapidly than that of conventional recognition. In a 
production setup, the processing time of nine-point mixtures would he approximately 

_2 , m ! 

3 6(m=2)l 

times that of conventional recognition, where m is the number of signatures. 

5.2. 1.1 Gomparison of Surface Water Detection Procedures 

For purposes of comparison, the water data set was processed with two 
other procedures. These procedures were developed at NASA ^^^Vand NASA personnel 
suggested that this comparison be made. One of them employs a two-channel 
discriminant with a universal decision algorithm. The other uses a tailored two- 
channel discriminant established with training data via procedures obtained from 
the reference* It should be emphasized that these discriminant techniques were 


[16] Anderson, A. C., 1973, Development of a Two-Channel Linear Discriminant 
Function for Detecting and Identifying Surface Water Using IE.TS-1 Data, 
Report No. JSG-0845P, NASA, Houston, ■ . 
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developed for detecting all water bodies of 10 acres or more*, and they did just 
that. Results are given for these discriminant methods as well as for nine-point 
mixtures in Table 2, 

It is clear that nine-point mixtures is best for this scene insofar as 
the number of lakes detected and % total water detected is concerned. However, 
processing time for this procedure is much slower than for the other two, by 
aboui, two orders of magnitude. 

Signatures for water and non-water were obtained from a training area which 
comprised approximately 5% of the scene. These signatures are shown in Figure 21. 
In this figure, the universal discriminant obtained from the reference is shown 
in line 1. The tailored discriminant, shown as line 2, was drawn by eye. 

The universal discriminant requires no signature extraction or experimenta- 
tion, and is extremely rapid. This procedure was found to detect lakes of greater 
than ten acres, however it functioned erratically on lakes of significantly 
smaller size. Overall accuracy was the lowest of the three in area determination. 
It should also be mentioned that it found two lakes where there was actually one 
narrow lake. 

The tailored discriminant requires signature extraction and some experimenta- 
tion to determine the linear discriminant function. The speed of classification 
is equal to that of the universal discriminant. Performance, however, was better 
in as much as lakes of ten acres or more were again reliably found, but the 
determination of lake size was more accurate. This procedure correctly identified 
a narrow lake as just one lake instead of two. 

Nine-point mixtures requires both signature extraction and experimentation 
to establish operating parameters. This requires more effort than the tailored 
discriminant. Nine-point mixtures detected all but one lake with an area of one- 
half acre or more while detecting a lake whose area was less than one-half acre. 
This procedure can be expected to reliably detect lakes of one acre or more. Area 
determination accuracy is also very high — the average error on each lake was less 
than one acre, with almost zero total error. The main disadvantage is processing 
time. 

*Report JSC 08449, table 8, page 7-4, documents the fact that the procedure was 
developed according to the criteria required for the National Program for the 
Inspection of Dams. These requirements were that the procedure must accurately 
detect the existence of lakes of 10 acres or more. Further, the procedure was 
not required to estimate sizes of water bt lies. 
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TABLE 2 

COMPARISON OF WATER DETECTION PROCEDURES 


PROCEDURE 

NO. OF LAKES 
DETECTED 
(OUT OF TWENTY) 

EQUIVALENT NO* 
OF WATER 
PIXELS FOUND 

% DETECTION 

Universal Discriminant 13 

162 

67.1% 

Tailored Discriminant 

12 

193 

79.9% 

Nine-Point Mixtures 

19 

245 

101.4% 


HGURE 21. 2 CHANNEL LINEAR DISCRIMINANTS 
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For each of the above procedures there were an insignificant number of 
false alarms. 

Figures 22, 23, and 24 are classification maps for the three procedures. 

The tailored discriminant (Figure 23) fills out the lake areas more completely 

than the universal discriminant (Figure 22) even though the latter detected one 

more lake. The classification map for nine-point mixtures (Figure 24) shows how 

this procedure not only detects interior water pixels (denoted by the symbol 

M), but also delineates boundary, or mixture, pixels (denoted by the symbol *). 

5.2.2 PRELIMINARY WHEAT DETECTION TEST 

The nine-point mixtures algorithm was applied to the Hill county data set, 

where results of conventional recognition and LIMMIX processing were available for 

comparison. Table 3 shows results of the three processing procedures on this 

data. The conventional recognition shown is the quadratic rule with a rejection 

2 2 

threshold of “. LIMMIX is shown employing parameter values of X 2 “ “ 

and proportion threshold i:= .4 (proportions less than t are set to zero, and the 

remaining proportions renormalized so that they sum to one). The nine-point 

2 2 

mixtures rule is shown using parameter values of ^2=4, Tlj^=30, and 1^1 2~5* 

Table 3 compares the detection of wheat for the three procedures. Thirteen 
signatures were employed. Three of these represented wheat. Detection rates 
were obtai:,ied as follows. All the pixels in the test area were designated, using 
ground truth information, as wheat or non-wheat. The detection rate is defined 
to be the equivalent percentage of wheat pixels found by the procedure to be 
wheat. The false alarm rate is defined to be the equivalent percentage of 
non-wheat pixels found by the procedure to be wheat. 

• The results in Table 3 were obtained by the use of classification maps and 
overlays; as such they should be treated as close estimates, rather than exact 
figures. For purposes of comparison, we note that field center recognition is 
about 80% for wheat on this data. However, on a7,l wheat pixels recognition was 
only 63.4%. It can be seen from this table that the nine-point mixtures algorithm 
shows itself superior to both conventional recognition and LIMMIX, LIMMIX, on 
the other hand, is significantly better than conventional recognition as a wheat 
detector. .. 
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TABLE 3 

WHEAT DETECTION (HILL CODIICY) 


Procedure 
9 “Point Mixtures 
Recognition 
LINMIX 


Detection Rate 
78.7% 
63.4% 
71.4% 


False Alarm Rate 
4.4% 

12 . 8 % 

6-7% 





5.2.3 ESTIMATING PROPORTIONS OF COEN AND SOYBEANS 

Tests were conducted on Fayette county data employing a set of signatures 
derived during the post-project analysis of the CITARS project results. These 

I 

signatures were obtained after breaking the data set into three parts: training 
(20 quarter sections), pilot (10 sections), and test (10 sections). These three 
parts contain 2880 pixels, 5630 pixels, and 5529 pixels, respectively. The 
'other' signatures were those previously obtained from the training quarter 
sections, while the signatures for corn and soybeans were obtained from the 
pilot sections. 

This was done because the corn and soybean fields in the training area 
had been found to be unrepresentative of the corn and soybean fields in the 
test data. 


The parameters of the nine-point mixtures rule were then established on 
the training quarter-sections, using accuracy of area determination as the 
criterion for selecting the best parameters. The results of the effort to establish 
parameters are shown in Table 4. 

Nine-point mixtures was then used on the test sections with parameter values 
2 2 

r|^=20, D 2 "^* Results were poor. Examination of field center pixels showed a 

problem with misclassified 'other' pixels. Investigation showed that the poor 

results were due principally to the fact that there was no rejection threshold 

2 

used for mixtures. To correct this, a third chi-squared parameter, r)„, was added 

2 ^ 

to the algorithm which sets a rejection threshold on mixtures, as r\^ does in 
LIMMIX, Mixtures which are rejected are called 'other'. With this addition, the 
parameters were again established on the training quarter-sections. The results 
obtained are shown in Table 5. 

2 2 2 

The best settings of the parameters (ri2^'=20, were employed on th^* 

test sections but again the results were poor. 

The parameters were then established on the larger set consisting of both 

the pilot sections and training quarter-sections. Results are shown in Table 6. 

A selection of four parameter settings including the best settings of the 
2 2 2 

parameters (ri^=20, 1)^=2, 5, ri2=2.5) were then used on the test sections. 




TABLE 4 

RESULTS OF ESTABLISHING NINE-POINT MIXTURE PARAMETERS; 
PARAMETER SETTINGS PROPORTION ESTIMATES (%> 


\- - , 

1 


Corn 

Soybeans 

1 ■- ■ 20'-^ 
i 

2.5 

26.4 

45.56 

i' '20 

5 . 

22.52 

47.02 

1 20 

7.5 

18.05 

50.66 

. ■ . 20 

*1 

10 

15.76 

53.43 

j 

i . Ground 

Truth 

23.53 

45.31 






formerly willow run laboratories, the UNIVEBBfTY OF MICHIGAN 


TABLE 5 

RESULTS OF ESTABLISHING NINE-POINT MIXTURES PARi^TERS 
(Training Quarter- Sections Only) 

Parameter Settings Proportion Estimation (%) 



2 2 

^2 ^3 

Corn 

Soybeans 

20 

5 2.5 

18.12 

43.83 

20 

5 5 

22.96 

45.47 

20 

2.5 2.5 

18.76 

40.47 

20 

2.3 5 

26.46 

43.35 

Ground 

Truth 

23.53 

45.31 
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TABLE 6 

RESULTS OF ESTABLISHING NINE-POINT MIXTURE PARAMETERS 
(Training Quarter-Sections Plus Pilot Sections) 


Parameter Settings 

Detection (%) 




Corn 

Soybeans 

20 

5 

2.5 

21.75 

38.13 

20 

5 

5 

18.30 

28.75 

20 

2.5 

2.5 

21.50 

34.88 

20 

2.5 

5 

28.99 

36.85 

Ground 

Truth 


24.54 

33.63 
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2 2 2 

The "best” parameters (ti^=20, 1)^=2. 5, ri2=2.5) yielded excellent results 
when used on the test sections as shown in Table 7. 

Why then were results so poor when parameters were established only on the 
training quarter-sections? We know based on detailed examinations j that the corn 
and soybeans fields in the training region are not adequately representative of 
the corn and soybean fields in the test region. This is why the corn and soybean 
signatures were obtained from the training quarter-sections plus pilot sections, 
rather than from the training sections alone. We believe that this is the 
explanation for the poor results obtained when the parameters were established on 
the training quarter-sections alone. 

An analysis of the error was made in order to establish the consistency of 
nine-point mixtures as an estimator of crop proportions. To do this the RMS 
error between nine-point mixtures crop proportion estimate and ground truth 
proportions was computed for each of the 10 test sections (averaging 553 pixels 
each) . The RMS error between the true percentage corn and the estimated percentage 
corn over the ten test sections was 3.53(%). For soybeans the corresponding 
figure was 4.33(%). 

5.2.4 ESTIMATING PROPORTIONS OF WHEAT 

Another test of nine-point mixtures on a data set with a target crop of wheat 

was made using the CITARS data set of 10 June 1973 on a 5 x 20 mile area of 

Fayette County, Illinois. There were twenty training quarter sections containing 

a total of 2880 pixels and nineteen test sections containing a total of 10,223 pixels. 

Because of time limitations, it was decided that we should at first limit 

2 2 2 

ourselves to the two best values of , T!„, and ri„, as indicated by the corn and 

± Z J 

soybean test. When using these two parameter settings, an attempt to establish 

parameter settings on the training data was made. The results of this test are 

2 2 2 . 

given in Table 8. The parameter setting of ri^=20, ri2'=2.5, r\^=Z*5 gives the closer 
estimate. 

Examination, of these results from the training area showed that most of the 
errors came from wheat recognitions in hay and summer fallow fields. As the 

level for accepting a pixel into the vote was rather large, it appeared that 

2 2 2 . 
decreasing should help the results. Parameter settings with siid Hj|^=7 were 

then tried on training data, and the results of this test are given in Table 9. 
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TABLE 7 

PROPORTION ESTIMATION ON TEST DATA 
(Fayette County, August 21, 1973) 

Parameter Settings Proportion Estimation (%) 



2 2 

Corn 

Soybeans 

20 

5 2.5 

13.56 

33.58 

20 

5 5 

20.12 

37.30 

20 

2.5 2.5 

15.85 

31.06 

20 

2.5 5 

24.56 

33,63 

Ground Truth 

14.16 

31.41 


*Note; The parameter set established by training 

gives the best results on the test sections. 
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TABLE 8 


PROPORTION ESTIMATION ON TRAINING DATA 


Parameter Settings 

Proportion Estimate (%) 

2 2 

2 

^ffiEAT 

\ /^2 

^3 

20 2.5 

5.0 

23.3 

20 2.5 

2.5 

18.1 


Ground Truth 


13.1 
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TABLE 9 

PROEORTION ESTIMATION ON TRAINING DATA (CONTINUED) 
Parameter Settings Proportion Estimate (%) 



4 

4 

WHEAT 

14 

2.5 

5.0 

22.5 

14 

2.5 

2.5 

16.5 

7 

2.5 

5.0 

21,5 

7 

2.5 

2.5 

13.5 

Ground Truth 

13.1 
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These parameter settings and results were then graphed. Extrapolation from 

2 2 

this graph indicated that the correct parameter settings would be ri2=2,5, 

n^=2.5. The results for this test is shown in Table 10, The results on the 
3 

test data were then computed for each of the parameter settings previously tried. 
These results are given in Table 11. A graph of the training and test results 
plotted against the parameters is shown in Figure 25. 

5.3 DISCUSSION 

Analysis of the results obtained by nine-point mixtures reveals that: 

(1) Nine-point mixtures has performed significantly better than conventional 
recognition as a crop proportion estimator for each data set examined. (2) Nine- 
point mixtures has shown itself capable of extremely accurate crop proportion 
estimation on one agricultural data set (Section 5,2.3). (3) On the other 

agricultural data set (Section 5.2.4), while nine-point mixtures performed better 
than conventional recognition, it is clear that better methods of setting the 
parameters should be investigated. (4) Nine-point mixtures has shown itself 
capable of extremely accurate proportion estimation of water, even with very 
small (.3 acre) lakes. (5) Nine-point mixtures appear to be consistent in this 
respect: it retains much of its accuracy even over small areas, as indicated by 
both the corn and soybeam test, and the water test. (6) Nine-point mixtures 
is comparable to conventional recognition in processing time for a reasonable 
number of signatures. 
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TABLE 10 

PROFORTIOH ESTIMATING ON TRAINING DATA 
(CONTINUED) 


Parameter Settings 

Proportion Estimate (%) 

n^ rq 4 

WHEAT 

6 2.5 2.5 

12.6 . 

Groimd TrutN 

13.1 

TABLE 11 



PROPORTION ESTIMATION ON TEST DATA 
Parameter Settings Proportion Estimate (%.) 


4 


4 

WHEAT 

20 

2.5 

5.0 

32.7 

20 

2.5 

2.5 

26.6 

14 

2.5 

5.0 

29,3 

14 

2.5 

2.5 

22.1 

7 

2.5 

5.0 

23.1 

7 

2.5 

2.5 

12.4 

6 


2.5-;. 

11.4 


Ground Truth 

9.0 



FIGURE 25. VARIATION OF THE PROPORTION ESTIMATE WITH IN THE 9-POlNT 
MIXTURE ALGORITHM (FAYETTE COUNTY DATA, 10 JUNE 73) 
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6 

CONCLUSIONS AND RECOMMENDATIONS 

Results of tests performed on LANDSAT data sets show that the LIMMIX and 
nine-point mixtures processing schemes offer significant improvement over 
both conventional recognition and MIXMAP processing. The reason for this 
superior performance seem to stem from the incorporation of prior information 
about mixture pixels and their spatial arrangement. The reduced number of 
spectral channels required for these procedures offers a further advantage over 
MIXMAP. Eor these reasons we believe that further testing of these newer 
concepts is warranted. In addition, reevaluation of data averaging should be 
considered. 

The attainment of superior performance via LIMMIX and nine-point mixtures 
is possible only when the parameters of these procedures are correctly set; 
thus the problem of setting these parameters warrants further study. This is 
especially true for nine-point mixtures because of it’s greater number of 
parameters . 

Analysis of signatures shows they are often clearly mul(:i7iiodal, and the 
employment of uniraodal signatures may degrade performance severely, ptis 
indicates that the effect of the utilization of several unimodal signature to 
represent a single object class be investigated in conjunction with these newer 
proportion estimation procedures . The possibility of doing this with MIXMAP 
is limited because of the restriction on the size of the signature set permitted 
relative to the number of spectral channels. 
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APPENDIX A 

ESTIMATION OF CORRELATION FUNCTION 


The computation of the estimate of the correlation function for a 
single field and single element channels is as follows. Ror simplicity, we 
assume that the field center pixels form a rectangular grid with line numbers 
L^-L^+l^N^j point numbers signal 

at coordinates (L,P) is denoted by X(L,P). The sample mean along point P is 
denoted by Xp where 


Xp = ^ I X(L.P) 


The sample variance along point P is denoted by Sp and is computed by 


2 1 
®P " N. 


^2 


I [X(L,P) - Xp] = ^ 2 X CL,P)-(Xp) 


Then an estimate R(j), 0<j<N -1, of the correlation function R(j) for the field j 

— Jj' 

crop type, and channel is taken as: 

R(0)=R(0)=1 

- ^2 tX(L,P)-X_][X(L*tj,P)-Xj 


P ' L 


P=P^ 




If we transform the data by 


yCL.P) = 


X(L,P)-Xp 


.1 
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then we have 


B^Cj) = 




2 ; 
I I 


y(L,P)- y(L+j,P) 




The transformation from X to Y may be thought of as correcting for a simultaneous 
multiplicative and additive scan angle effect. 

Now let us assume that there are K fields and that the estimated correlation 
function for the field is denoted by: 

\Cj) l<k<K, -1 

Let the average of the correlation functions over the K fields be denoted by 


where j ranges over 


N . “ min N_ ^ 

mxn ^ L,k 


Then 


R ( 0)=1 


Oj 1 

R(d) = ^ 


K ^ 


3^1 


l<i<N . “1 
— = ram 



1 


: t 


1 I 


) I 
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APPEffDIX B 


I i 


DESCRIPTION OF LUMIX 


! ! 


i 


The following is a step by step explanation of the LIMMIX algorithm. The 
step nximbers correspond to the accompanying flowchart. (Figure 26). Also 
included is a list of program variables. 

EXPLANATION OF ALGORITHM 


1. The tiean (A) and covariance matrix (M) for each, signature is entered 

and stored. A is an n, x 1 matrix and M is n x n (n = number of channels). 


I ■; 




i 
; i 
!' ! 


• I 
I ! 
: '.I 
J 

r i 


2. To save time, four frequently called terras are precomputed and stored. 
-1 


A. M 


C. 


The inverse of the average of the covariance matrix for all 
combinations of up to L at a time are calculated. The subscript c 
designates the combination number. For combinations of one at a 
time, c goes from 1 to m (m = number of signatures.) The combination- 
numbers for 2 at a time begin at m+1 and are m this orders signatures 
1 and 2, 1 and 3, • * • , 1 + m, 2 and 3 , 2 and 4, . . . , etc. 


_-l 

B . M A. . 
c 1 


Each of the previously stored matrices are multiplied by each 

of the A (mean) vectors which correspond to the component matrices 

.-1 ^ 

used to compute each matrix. 

Example: M^, ^ is multiplied by A 2 and the result stored 

_-l 

Then M is multiplied by 4 ! _ and the result stored 
-2 ,5 . . ■ o 

Each of these products result in an n x 1 matrix. 


*-l ■ 

r 

First, the r (gamma) matrix is calculated. Here is an example 
where the matrix combination containing covariances of signatures ohea 
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two, and four are used. 


M will stand for 



r 




AH 

4^2 




A^ 

2 4 

VfH 


J 


r is an m X m matrix, Independent of the number of , channels. (Each 
jS. HP is -. 1x1 matrix, or just a number) . 

■fc 

r , the augmented matrix, is the V matrix with an extra row 
and column of ones, and a zero in the nri-1, md-1 position. 


Example: 



The inverse of the above matrix is stored for all the combinations. 

i 

D. ln|M I 

As a biproduct of taking the inverse of the average covariance 
matrix, the determinant is computed. The natural log of these 
determinants are stored for use later in the likelihood and chi- 
squared calculations. 


The likelihood, chi-squared, and proportion vector storage bins are 
given initial values. The data vector (n x 1) from the first pixel is 
multiplied by the transpose of the first MB vector (1 x n) ' to yield g. 

For the first m Galculations, there is one A M and g is just a number. 
When HA In- are called from storage in sequences where two or more 4^ a are 
multiplied hy a common M, g is a vector whose length is the number of 
components of M. ' 
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matrix that used 


7 . The g vector is augmented and then multiplied by the r 

the same matrix. Thc product gives the proportion vector of the com- 

c 

ponent signatures and A ("^hich serves th§ role of a Lagrange multipli^^') • 


-10. If any one of tt.e proportions is negative, that signature set is rejected 

A— 1 

as a solution and the next and T matrices are used to find a new X 
vector. When an all positive proportion vector is found, the likelihood 
and chi-squared values are calculated. 

-14. If this calculated likelihood is greater than (i.e., less likely) the 

stored value Ca^) , the signature set is rejected as a solution. However, 

if a^ > a, the new values for the likelihood, chi-sqUare, and proportion 

vector are substituted into temporary storage. When the level is complete 

(i.e., when all the one at a time or two at a time etc. combinations have 

2 

been looked at) a^, and the X^'s are stored to be output on tape later. 
If the level has not been completed, the matrices for the next combination 
of signatures are brought from storage and the calculations starting from 
step 6, are done again. 


■17. l-Hien a level has been completed, the values that will be calculated in 
the next level are initialised with those calculated from the previous 
level. This is not really a step, since the winning values from the 
previous level are already in temporary storage (step 13). 

Step 16 is included to clarify the fact that the values are not 
initialised to the same numbers in the succeeding levels as they were 
in the first (step 5) . This method of initialization prevents solutions 
at the i + 1 level being less likely than the solutions at the i level, 
e.g,, the most likely two at a time may be a recognition with the second 
proportion equal to zero. 

When all four (or less) levels are complete, the likelihood, chi-square 
and proportion vector for each level are put on tape, and the algorithm 
proceeds to the next pixel. 
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A, M 


Precompute 
and Store 


^0 ^ 

A. 's = 0 
o 


A Signature mean vector 
M Signature covariance matrix 
X Data vector 
a Likelihood 


Chi-square 
Proportion vector 
Combination number (subscript) 


IniMl, 



3 




Pick Data 
Point X 


4 




Choose First 

5 







Choose Next 


^A., X > 
r 


& r*“- 




in Sequence 

.T . 


J'No 





a = <M‘^X,X> - - A + In iMl 

■ r, ^ 

X = a - In I M I 


FIGURE 26. LMMIX FLOW CHART 
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LIMMIX. is a modtile of POINT. A program whose main function is to transfer 
data points to its modules in accordance with control data. This control data 
specifies the ground area to be processed. The format of POINT, is such that 
modules may be called at several different stages of the data processing 
procedure. At step one of POINT., before any data has been read in, LBMIX, 
control variables are read, initialization is completed, and pre-computations are 
done. In step two, the number of output channels is set to 22. This is after 
the POINT, control variables have been read but before any processing is done. 
Lastly, POINT, calls the processing part of LIMMIX. (Flowchart Step 3 to the end) 
for each data point until the area is complete.. The next page contains a list 
of the LIMMIX control variables. 


CALLING SEQUENCE; $COMPILE MAD, EXECUTE 

POINT. (LIMMIX.) 

S’M 

$ BINARY 

LIMMIX, BINARY DECK 
$ DATA 

READ AND PRINT DATA CAED(S) FOR LIl-IMIX.* 
SIGNATURES 

INBIN,OUTBIN,FILE,OFILE, . . . 

NSA'S (OR POLYGON COORDINATES^)* 


^^^POLYGON coordinates have been run with LIMMIX. on the ERIM ERTS project 
with no apparent errors. 
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LIMMIX VARIABLES 


IT IS R^ISET TO 
DEFAULT APXER 


VARIABLE 

DEFAULT 

■MODEl-SSTARTS* EXPLANATION 

HSIG^^^ 

None 

No 

Number of signatures (each with NV 
channels subsets of chantieis not 
allowed for signatures) 

ND 

None 

No 

Maximum number of Channels on the 
data tape. 

{ NV^^^ 

None 

No 

Number of channels on the data tape 
that will be used (NV = NGHAN) . 

r IGODE 

2,3,4, . . . 

No 

Which channels on the data tape 
Exaic'c; ICODE(l) = 1,2,4 

(2) 

RANK 

4 

Yes 

Maximum number of signatures con- 
sidered in the identification of 
each pixel. RANK must be <4 
(RANK x NCOMP (ALIEN 2) ) 

SCALE 

-1 ■ 

Yes 

2 

Scale factor for x (ehi-squared) 

MODE 

0 

Yes 

=0 means read the first NSIG signatures 

= 1 means search tape for NSIG signatures 
whose names are read, one to a card, by 
2G6* at the time of the search. 

= 2 means search tape but use the previous 
name list, 

=-l means return without reading nfcs? 
signatures. Do not use this option 
in LINMAP or GLASFY. 

GC 

0 ■ 

Yes 

=-l print nothing 


= 0 print the i,d. card only 
= a character. Print the i.d, card, mean 


vector and covariance matrix with GC as 
the carriage control character for the i.d 


/ 2 ) ■ ■ 

. values that are set for NSIG, NV, and RANK are interrelated. For 

a gxven NSIG and NV, there is a maximia?;. setting for RANK, The following table 
shows the relationship. 

RANK < NV+1 when RANK = NSIG 

and 


I1A1^K < NV when RANK # NSIG 
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APPENDIX G 

DESCRIPTION OF ALIEN 2 

This program, j'lritten in the programming language MAD as implemented 
on an IBM 7094 computer, performs the analysis algorithm for LIMMIX. 

This program accepts the output of LIMMIX. > and produces the following 
output; 

1. How many pixels were classified by recognition, mixtures with 
2 signatures, mixtures with 3 signatures, mixtures with 4 
signatures, and the number unclassified, 

.2, The amount of each material classified by each of the above 
methods. 

3. The amount of each material classified by each of the above 
methods, but as a fraction of the total number of pixels, 

4. The amount of each material classified by each of the above methods, 
but as a fraction of the total, number of classified pixels, 

1. The amount of each material from recognition, recognition plus 

2 signature mixtures, recognition plus 2 and 3 signature mixtures, 
and the total amount of each material. 

6. The mean square error (in pixels) of the subject material, both .» 
for each area and the sum of all of, the areas. 

7, The percent mean square error of the subject material, both 
for each area, and the sum of all the areas. 

Portions of this output can be suppressed. 

The program is a module of POINT, a program which provides data 
points to it’s modules in accordance with control data (NSA cards) and in 
a rigid format which includes calls to all of the modules b^ore any control 
data is read, after control data is read, before each line of data is read, 
and after each urea has been processed. For each data, point, a call to 
the internal function of each module is made, for processing of data. 





The program ALIEN2 is organized as follows; 

STEP(l) - (This step is called before POIKT’s control data is read) 
setup and initialization, obtain control variables. 

STEP (2) " (POINT, calls this after initial control data is read) , 
zero smas of number of pixels of each class if starting 
new region, 

STEP (5) - (POINT, calls this after each area is processed) if this 
wasn^t the last area to . be combined into one region, 
return to POINT. Otherwise compute and print out statistics. 

Internal Function PSUM - (POINT, calls this for each data point). It 

is here that the decision as to whether a point is one material, 
two materials or more is made, and here the thresholding 
and/or renormalization is done, and finally the pixel is • 

added to the running stun of the number of pixels of each 
material. 

The variables J(l) through J(A), (in the THROUGH loops, lines 106-109) 

2 2 2 2 

correspond to the variables X 2 » X 3 > of LIMMIX, which are used as 
thresholds Ci^i lines 114 and 115) to decide whether the pixel is one 
material, two or alien. If the pixel is alien, the alien COUNT is 
increased (in line 117), otherwise, the correct N-materials at a time count 
is incremented in line 119i 

Then, in lines 121 to 129, the combination of materials is decoded, 
and the proportions of these materials are stored in the OUT array. 

At this point, either thresholding, or thresholding with renormalization 
is done to the proportions (in lines 136 to 138 or 139 to 150, respectively), 
and finally these proportions are adtied to the SUM array, which holds the 
accumulated totals for each material. Optionally, a likelihood weighting 
can be used as a decision rule, and this is done in lines 151-162. See 
Figure 27 for AhIEN2 flowchart. 

The arrays COUNT and SUil are indexed by the variable 2, which is 
incremented each time the data point has been processed by a set of 
parameters, and thus the proportion of each material (in SUM) and the count 
of how many pixels were pure, or of two materials, etc., (in COUNT) is kept 
separate for each parameter setting. Further, the array S®Ms indexed 

..... 90 .. . 
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by K, the number of materials in that pixel, so that the amount of pure 
material, two material mixtures, etc., is kept separate for each 
parameter setting and each material. . 

Control variables for this prograa are as follows; 


VARIABLE 


THRESH 


NORMAL 


POSSIBLE 

VALDES 


DEFAULT 


LIKELY 


HAFOUT 


ERROUT 


NGO>IP 


1-4 


FUNCTION 


Vflien THRESH'=$ON$ , any proportion less 
than TAU* is set to zero 
1^hen N0RMAL=$0N$, any proportion less 
than TAUi^ is set to zero> Then all 
remaining proportions are re-normaliaed 

-Oti ■ 

to sum to one. 

l^hen LIKELY=$ON$, then the likelihood 
decision rule is used, i.e, , 

are 'weighted *, and the minimum 
is decided upon (see description of 
LBMIX output tape). T-Jhen LIKELY»$ON$, ^ 
must be specified. T'Jhen LIKELY is 
not $0N$, the Chi squared decision rule 
is used. 


"When HAF0DT=$0N$, output items (i)-(4) 
are not calculated . 

When ERROR=$ON$, output items (5) & (6) 

are calculated. tJhen SRROR’=$0N$, TRDTH^.CHAN* 

must be specified. 

A maximum of NCOMP-signatures per 
mixtures was used. 

Number of signatures . 

Thresholding value, see NOEtlAL* 


*See Specification of Variable 
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VARIABLE 

POSSIBLE 

VALUES 

DEFAULT 

START (1-4) 

INTEGER 

1 

STP (1-4) 

INTEGER 

1 

JEND(l-4) 

INTEGER 

1 

CHAN 

1-9 

1 

P (1-4, 1-21) 

any 

0 

TRUTH 

any 

Garbage 

LCUT(l-4) 

any 

1000 

SUMOUT 

$0N$ 

off 


FUNCTION 

Starting value of index. See Figure 28. 
Increment of Index, See Figure 28. 

Final value of Index. See Figure 28. 
Material or signature under consideration 
In calculating mean square error. 

Weights used In likelihood decision rule, 
indexed by START* . STP* . JENP* . see 
Figure 28. 

The amount of material under consldvtratlon 
for each area. 

Allen thresholds for likelihood 
decision rule. 

When SUMOUT“$ON$ only output Item 
(4) Is calculated. 


*See Specification of Variable 

















































DO FROM J(l) - START(l) to J(l) - JEKDCD BY STEFS OF STFUJ 


DO FROM J(2) - SXART(2) to 


J(2) - JEKD(2) BY STEPS OF STP(2) 


DO FROM J(3) - START(3) to J(3) - JEND(3) BY STEPS OF STP(3) 
— DO FROM J(4) « START{3) TO J(4) - JEND{4) BY STEPS OF STP(4) 


LIKELIHOOD RULE 


TAKE MINIMUM OF 


L(I) + P(I,J(1)) 


FOR I«l, 2, 3, 4 


IS Xj < LCUT(I)7 

|yE5 

CLASSIFIED * 
AS I*^" DATA SET 


CHI-SQUARED RULE 

IS < J(i) yes 

no 

IS X 2 < -IC2) YES 
no 

IS X 3 < JC3) YES 
no 

IS X4 < JW) yes 


FIGURE 28. EXPANSION OF ALIENZ DECISION RULE 


it tli i- 
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APPENDIX D 

DESCRIPTION OF GE0M2 


The distances obtained for the geometrical signature analysis for 
IMiCEiX processing (GE0M2) are defined as follows. To avoid notational 
complexity we will assume that a specific subset of L+1 signatures has 
been chosen and relabeled, if necessary, so that their means are denoted 
by A 2 J.. and covariance matrices by M 2 j . . • . Let 

denote the hyperplane of dimension k=l though the means A 2 ,.. and let 

Z be the point in which maximises the Gaussian density with parameters 


Ai, M^. 


Then d^^ is defined by 


4 = 


r-l 




It has been shown by W, Richardson in Reference [2] that d^^ may be 
computed in the following fashion. Let T denote the (L4-l)x(L+l) matrix 
with entries <A., m 7^A.>; 1 < i, j < L+1. 


Let denote the column vector of length L mth all components equal to 

X c 


Ij and let P* be the (L+2)x(L+2) matrix defined by 
I 


r* 

1 


Then 


1 " 

.2 


^1 

t 

% 


- 7E 

is the (1,1) element of the inverse of P^. More gennrally 


is the (i.i) element of the inverse of P^, 1 < i < k*M. 


By some manipulation one can obtain a more convenient form of the 
Richardson result for the computation of the d^. Let = ^l“^l 
let Pj^ be the L X L matrix defined by 
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■■t = 


’‘\v ••• ®L+1,1^ 


L+1,1 I 


Set 



2 — 2 
Then -d. - is the (L,Ii) element of the inverse of V^, In general, -d. is 
1* * J- 3# 

the (L,li) element of the inverse of r|, 

B,1 0UTLI15E OF THE PROGRAM 

A. Input control data, signatures. Store covariance terms in the 
GAMMA matrix, and means in the MATRIX matrix. 

B. For each signature (say, them signature) repeat through B(3). 

B(l) Assign MATRX2 (I,J) MATRIX(J,I) - MATRIX(M,l) 

’ th 

This moves the m corner of the signature complex to the origin. 
B(2) Assign MATRIX (I, J) MATRX2(J,l) Invert this translated signature 

complex 

B(3) Assign GAMMA (M,I,J) •<- MATRIX(.T,J) (GAMMA(M,I,J)"^)MATEX2CI, J) 

til 

C. For each signature (say, them signature) repeat through C(5). 

C(l) . For each combaination of (MO-1) signatures, (X^ X 2 which 

does not include the m^^ signature, select the following 
elements of GAMMA (M, I, J) 

C^i* ... ^ 0 - 1 ^ 

a • 

• * 

^^0-1’ ^iV ‘ V* ^^0-1’ ^0-1^ 

and arrange these elements of GAMMA(M, I, J) in a (MO-l)X(NO-l) 
matrix, in the above order, assign these elements to 
MATRIX(I, J) . 
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G (2)., Augment MATRIX (I, J) by placing 1,0 in the (NO-l)*"^ row and 
column, and 0,0 in the CCNO-1), (MO-1)) position, 

G(3), Invert MATRIX (I, J) and place the inverse back into MATRIX(I,J) 
C(4) , Assign MATRIX(M0, MO) C-MATRIX(MQ,M0))^^^ 

C(5), The element MATRIX(M0,N0) then contains the desired answer, 
and is, printed out, 

D, End of Program 

D . 2 HOW TO RUM THE PROGRAM 

GE0M2 needs the following input; 

NO - the size of simplex used measuring the distances, i,e,, in 
Figures and , Np=3, 

MSIG - the total number of signatures to be input (less than twelve) 
MCHAM - he number of channels in the data (less than twelve) 

A typical deck might look like 
$ COMPILE MAD, EXECUTE 

■ .GE0M2, ■ ■■ ■■■ 

E^M ■ 

$BINARy 

{GE0M2 BINARY DECK) 

$DATA 

N0=5, NSiG=&, NCHAN=4* 

{SIGNATURE DECKS IN STANDARD ERIM FORMAT} 
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APPENDIX E 

DESCRIPTION OF CLUSTR 

This program tnritten in the programming language MAD as implemented 
on an IBM 7094 computerj implements the clustering algorithms descrihed 
in Section 4 o 3* ' 

E.l ALGORITHM ONE 

This is the default algorithBii The following variables must he set; 

NCHAN “ nximher of channels, if different from 
4 (must he < 8) 

Any POINT control variables (i.e., HC. NV, ICODE(l)) 
LASTID- ID EIELD of last NSA to be clustered. Default 
is $TTTTTT$ 

Rare; 




If only a very few clusters are produced, it may be necessary to 
set RHOSRT to less than 8. With more than S channels, increase 
RHOSRT as follows: 

RHOSRT - this should be set to 

i=NCEAN 

H (Range of 2 standard deviations in channel i) 
i=l - 



PJhere = 150 for NCHAN = 1 to 6, 100 for NCHAN - 7, 70 for NCHAN = 

Also, set REPLAC = /RHOSRT 

If some clusters contain top many data points, or if there are too 
many clusters produced, it may be necessary to reset PERCTl and PERGT2, 
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PERGTl -whenever a cluster holis more than PEE.GT of the points, 
clustering is stopped. Default is .166, 

PERCT2 - whenever the largest NUM clusters hold more than PERCT2 of 
the points, clustering is stopped, default is ,666, 

E.2 ALGORim TWO 

All variables are the same as in algorithm one, except 

IMIT - if imT-$OW$, points from MSA^s with differing FIRST SIX 
CHARACTERS in the ID field -will be separated, and these 
first six characters will be used as the name of the 
cluster j this is useful to identify multimodality. 

E.3 ALGORITHM THREE 

POINT CONTROL VARIABLES Ci.e.»NC, NV, IGODE(I)) 

NSPACE - must be set to $0N$ 

SEQ - if SEQ=$ON$, updating of means and variances will occur 
after each point, otherwise after each USA, Recommended 
$0N$ for < 2000 points only. 

NNIEB - if NNIEB=$0FF$, the linear classification rule is used 
for point assignment, otherwise, euclidian distance is 
used, 

NDM only the largest NUM clusters are displayed, 

NUM < 30, default is 15, 

LASTID - as above, 

NGHAN “ as above but < 5. 

E.4 HISTOGRAMMING 

Histogramming : After the completion of any of the above algorithms, 

a hxstogr^ of all the major clusters can be obtained at ,no additiohal cost , 
This type of histogram has the advantage that it represents the data set, 
sans noise and mixtures, and it requires no tape mounts. 

HIST=$ON$-default is $ON$ 

MIN - the smallest data value displayed, default is 1, 

MAX - the largest data value displayed, default is 100, 
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After clusters have been produced, it may be necessary to further 
combine them. 

This' program has the capability of combining signatures together on 
the basis of probability of misclassification, and this combining is 
stopped whenever more than PERCTl of the points in all of signatures have 
been combined, or whenever the most populous NQM clusters have more than 
PERCT2 of the points in all signatures. 

To use this capability, simply specify GQMCOV^ $ON$, and MSIG= Number 
of signatures; this is to be followed by the signatures. If it is desired 
that signatures with unlike ID fields (i*e., the first six characters of 
the ID) not be combined, specify INIT=$0ff$ also. If it is desired to combine 
the signatures xdLth weighting by the number of points in the signature, 
specify W1GHT=$0N$. 

SAMPLE SET UPS ; 

CLDSTERIMG ALGORITHM OME, HISTOGRAMMIMS . 

CHAHHELS 4,5. 6. 7, 8 used out of 2A 
$CQMPILE MAD 

POINT. CCLUSTR.) 

E’M 

OLUSIR. 

OBJECT y 

NC-24, NV=5, NCHAN=5, ICODE(l) = 4,5,6,7,8 

HIST==$ON$* 

NSA’S TO 
CLuSTERSD ) 

^NSA= ■ $TTTTTT$* _ 

CLUSTERING OE SIGNATURES , 4 CHANNELS , 28 SIGNATURES . TJEIGHTED COMBINING 
$CC^ILE:;MAD,- EXECUTE ^ 

POINT, (CLUSTR.) 

EtM 

'■ .^BINAKI'- ■ 

CLUSTR I 
OBJECT ) 

DECK .:,;\$DATA 


y?RiM 

Lmmi 
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COMCOV=$OH$, NSIG=28, WEIGHT=$ON$* 

SIGNATURE 

DECKS 

CLUSTERING ALGORITHM TWO, NO HISTOGRAMMING, LAST N5A ID IS $QQ100$ 

FOUR CHANNELS 

$COMPILE MAD 

POINT. (CLUSTR.) 

E»M 

$BINARY 

CLDSTR. ( 

OBJECT j 
DECK * 


$DATA 

INIT=$ON$, LASTID=$QQ100$* 


1 NSA« 

$SOY$ 

1 NSA“ 

$S0Y$ 

NSA’S { NSA= 

$CORN$ 

TO BE j 

etc . 

CLUSTERED i 

^ NSA= 

$QQ100$* 


CLUSTERING ALGORITHM THREE. UPDATE AFTER EACH POINT, LINEAR RUT.E FOUR CHANNELS 
$COMPILE MAD 

POINT. (CLUSTR.) 

E’M 


$BINARY 

CLDSTR. ( 
OBJECT ) 

DECK \ 


$DATA 

NSPACE=$ON$, SEQ=$ON$, NNIEB=$0FF$* 


NSA'S 
TO BE 
CLUSTERED 


NSA= 

NSA= 


$TXTTTT$* 
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e.5 list of the control variables 


VARIABLE 

DEFAULT 


NAME 

VALUE 

EXPLANATION 

NCHAN 

4 

Number of channels to be used (<8) (This 
must always be specified if different from 4) 

INIT 

$0FF$ 

An option to ’name’ clusters, see algorithm 2. 

NSFACE 

$OFF$ 

To effect use of algorithm 3, set NSPACE=$ON$ 

NUM 

10 

The most populous MUM clusters are used for 
display (NUM < 30) 

LASTID 

$TITTTT$ 

This is the ID field of the last MSA used for 
each operation. See examples 

NNIEB 

$ON$ 

This specifies that the distance measure to 
be used with algorithm 3 is the L^^ or Euclidian 
metric 

SEQ 

$OFF$ 

When SEQ“$0N$, the means and variances in 
algorithm 3 are updated after each point; 
when off, after each pass 

CUT (3) 

10 

Any cluster with > CUT (3) points in it will 
have a signature deck punched up for it, 
unless CAEDS«$0FF$ 

CARDS 

$0N$ 

When CARDS=$0FF$, no signature decks are 
pi:nched . 

PERCTl 

,166 

In algorithms 1 and 2 whenever a cluster 
contains more than PERCTl of the points, the 
combining is halted. 

PERCT2 

.666 

In algorithms 1 and 2 whenever the MUM 
largest clusters contain more than PERCT2 
of the points, combining is halted. 

HIST 

$0N$ 

^•Jhenever HIST=$0N$, histograms of the clusters 
will be made for each channel. 

CMS 


Number of approximating cells. Program runs 
faster with fewer cells, but less accurately. 
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VARIABLE 

DEFAULT 

VALUE 

EXPLANATION 



CMX Defaults to: 150, for NCHAN=l-6 

100, for NCHAN=7 
70, for NCHAW=8 

MIN 

1 

Smallest value displayed in histogramming 
(MIN > 0) 

MAX 

100 

Largest value displayed in histogramming 
(MAX < 300) 

SMX 


Maximiim number of points stored, in Algorithms 1 
and 2 

SMX Defaults to: 800, for NCHAN=l-3 

700, for NCHAN=4 
500, for NCHAN=5-6 
400, for NCHAN=7-8 

NSIG 

— 

Number of signatures to be combined, used 
only with C0MC0V=$0N$. 

WEIGHT 

$0N$ 

When WEIGHT=$0N$, combining of signatures is 
weighted as to the number of points in each 
signature. Used only with C0MC0V=$0N$. 

COMCOV 

$0FF$ 

^ilhen C0MCOV=$0N$, the program will ’cluster’ 
signatures, l.e,, combine signatures on the 
basis of high probability of misclassif ication 

CUT (2) 

2 

Any cluster with less than this number of 
points in it is ignored for purposed of combining. 

THETA 

4 

This is the 0 of algortihms 1 and 2 (step 1 of 
description) 

RHORST 

8,0 

2 

This is the a..(0) of algorithms 1 and 2 (step 2 
of description^ 

kirLAC 

4 

2 2 

Any cluster with a a^^<REPLACE has that set equal 

to REPLAC during the computation of the probability 
of misclassiflcation. It is assumed that for a 
cluster with a variance less than REPLAC has, the 
estimate of the variance is poor. 

SIZE 


This is a vector giving the minimum and maximum 


values of each channel, (SlZE(l)»max value of first 
channel, SIZB(3)==ntaK value of second channel, etc.)j 
used with algorithm 3 to specify the data space. 
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APPENDIX F 

DESCRIPTION OF NINE-POINT PROGRAM (NPM) 

This program, written in the programming language MAD as implemented on 
a IBM 7094 computer performs the algorithm of nine-point mixtures. 

It uses the output tape of LIMMIX as input, and determines the amount 
of each material found in a region, program NPM is a module of POINT, 

a program which transfers data from the input tape to its modules on a 
pcint-hy-point basis, POINT calls its modules as follov^cji STEP(l) of NPM 
is transferred to before any processing takes place, STEP (4) after each 
scanner line of data has been processed, and STEP (5) after each area in 
the region has been processed. A call is made to the internal function 
of NPM for each data point to be processed. 

The organization of the program is as follows: 

In STEP(l), input of control variables, set-up and initialization is 
done. 

In STEP(4), after each line is processed, the actual decision rule 
is implemented, and a running simi of the amount of each material 
found is kept. 

In STEP (5), after each area of the region is processed, the ID field 
of the POINT control card is examined to determine whether or not 
the end of the region has been reached, if so, the totals are printed 
out, otherwise nothing is done. 

In the internal function, the data which is passed by POINT is stored 
into the vectors LINE, LINEl, LINE2, LINES, LIKE4, LINES, for processing 
after the end of a scanner line of data. 

An outline of the program is as follows: 

A. read in control data, initialize storage 

B, for each point of data, store DATUM (2) in LINE, DATUM (4) in LINEl, 

DATUM(S) in LINE2, DATUM(6) in LINES, DATUM(7) in LINE4, DATUM(9) in LINES. 
Each DATUM is entered into the appropriate LINE vector in the position 
corresponding to the data point’s position in the scanner line. 

DATUM (2) is the identity of the recognition 

DATUM(4) is the chi-squared level of the recognition (x 500) 
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DATUM(5) is the proportion of material one in the mixture (x 500) 

DATUM (6) is the proportion of material two in the mixture (x 500) 

DATUM(7) is the code giving the identities of the materials in 

the mixture 

DATUM(9) is the chi-squared level of the mixture (x 500) 

DATUM(2) and DATUM (4) are stored only if DATUM(2) < CHICUT, where CHICUT 

2 " 

is the lit of nine-point mixtures, 

C, After each line, perform the following for each point of the previous 
data line; 

C(l) take a vote of the 9 pixels forming a block around the center 
pixel with regard to their identity, which is obtained from the 
LINE vector. 

C(2) find the largest and second largest vote totals and store these 

in HOLD and HOLD2, with the number of the corresponding signature 
in SAVE and SAVE2t 

C(3) if the vote is > HOWMNY (HOWMNY is the of nine-point mixtures), 
add one to the corresponding signature’s total, which is kept in 
SUM, Go to C, 

C(4) if the vote (C(3)) fails, examine LINEl, to see if the center 

2 

pixel’s chi-squared level is < CHI2 (CHI2 corresponds to ri 2 
nine-point mixture.s). If this chi-squared level is < CHI2, accept 
the center pixel’s recognition and add one to the corresponding 
signature’s total. 

C(5) if C(4) fails, examine HOLD and H0LD2, to see if they are both 

> CUT2 (CUT2 corresponds to N 2 in the nine-point mixtures) , if this 
is so, add (HOLD/ (H0LD+H0LD2) ) to the SAVE signature’s total, and 
(H0LD2/H0LDfH0LD2) ) to the SAVE2 signature’s total. Go to C. 

C(6) ii C(5) fails, check LIMES, to see if it is < CHICT2 (CHIGT2 

2 

corresponds to in nine-point mixtures). If this is so, 
decode the combination number in LINE4 to determine which 
materials are in the mixture and add the correct proportion to 
each of these two material's totals (obtained from LINE2, LINES) 
after > CHICT2, call the pixel alien, and go to C, 
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D. After all the lines in the area have processed, examine the ID field 
of point’s control card to determine whether or not the next area 

is to be added to this one. If so, go to B, otherwise, print out 
the totals in SUM, zero all suras and go to B. 

E. End of Program, 


VARIABLE 

DEEADLT 

EXPLANATION 

CHICUT 

— 

2 

of nine-point mixtures 

CHI2 

— 

2 

r\^ of nine-point mixtures 

CHICT2 

— 

Ti^ of nine-point mixtures 

NUMu 

— 

of nine-point mixtures 

NUM2 

1 

N 2 of nine-point mixtures 


2 ™ 
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